Package org.apache.nutch.parse

Interface Summary
ParseFilter Extension point for DOM-based parsers.
Parser A parser for content generated by a Protocol implementation.
ParseStatusCodes  
 

Class Summary
HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.
Outlink  
OutlinkExtractor Extractor to extract Outlinks / URLs from plain text using Regular Expressions.
Parse  
ParseFilters Creates and caches ParseFilter implementing plugins.
ParsePluginList This class represents a natural ordering for which parsing plugin should get called for a particular mimeType.
ParsePluginsReader A reader to load the information stored in the $NUTCH_HOME/conf/parse-plugins.xml file.
ParserChecker Parser checker, useful for testing parser.
ParserFactory Creates and caches Parser plugins.
ParserJob  
ParserJob.ParserMapper  
ParseStatusUtils  
ParseUtil A Utility class containing methods to simply perform parsing utilities such as iterating through a preferred list of Parsers to obtain Parse objects.
 

Exception Summary
ParseException  
ParserNotFound  
 



Copyright © 2012 The Apache Software Foundation