Package org.apache.nutch.parse

Interface Summary
HtmlParseFilter Extension point for DOM-based HTML parsers.
Parse The result of parsing a page's raw content.
Parser A parser for content generated by a Protocol implementation.
 

Class Summary
HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.
HtmlParseFilters Creates and caches HtmlParseFilter implementing plugins.
Outlink  
OutlinkExtractor Extractor to extract Outlinks / URLs from plain text using Regular Expressions.
ParseData Data extracted from a page's content.
ParseImpl The result of parsing a page's raw content.
ParseOutputFormat  
ParserChecker Parser checker, useful for testing parser.
ParseResult A utility class that stores result of a parse.
ParserFactory Creates and caches Parser plugins.
ParseSegment  
ParseStatus  
ParseText  
ParseUtil A Utility class containing methods to simply perform parsing utilities such as iterating through a preferred list of Parsers to obtain Parse objects.
 

Exception Summary
ParseException  
ParserNotFound  
 



Copyright © 2011 The Apache Software Foundation