org.apache.nutch.parse
Interface HtmlParseFilter
- All Superinterfaces:
- Configurable, Pluggable
- All Known Implementing Classes:
- CCParseFilter, HTMLLanguageParser, JSParseFilter, RelTagParser
public interface HtmlParseFilter
- extends Pluggable, Configurable
Extension point for DOM-based HTML parsers. Permits one to add additional
metadata to HTML parses. All plugins found which implement this extension
point are run sequentially on the parse.
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
ParseResult filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
Copyright © 2006 The Apache Software Foundation