org.apache.nutch.parse
Interface HtmlParseFilter

All Superinterfaces:
Configurable, Pluggable
All Known Implementing Classes:
CCParseFilter, HTMLLanguageParser, JSParseFilter, RelTagParser

public interface HtmlParseFilter
extends Pluggable, Configurable

Extension point for DOM-based HTML parsers. Permits one to add additional metadata to HTML parses. All plugins found which implement this extension point are run sequentially on the parse.


Field Summary
static String X_POINT_ID
          The name of the extension point.
 
Method Summary
 ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
The name of the extension point.

Method Detail

filter

ParseResult filter(Content content,
                   ParseResult parseResult,
                   HTMLMetaTags metaTags,
                   DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.



Copyright © 2011 The Apache Software Foundation