org.apache.nutch.parse.headings
Class HeadingsParseFilter
java.lang.Object
org.apache.nutch.parse.headings.HeadingsParseFilter
- All Implemented Interfaces:
- Configurable, HtmlParseFilter, Pluggable
public class HeadingsParseFilter
- extends Object
- implements HtmlParseFilter
HtmlParseFilter to retrieve h1 and h2 values from the DOM.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HeadingsParseFilter
public HeadingsParseFilter()
filter
public ParseResult filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Description copied from interface:
HtmlParseFilter
- Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
- Specified by:
filter
in interface HtmlParseFilter
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
getElement
protected String getElement(String element)
- Finds the specified element and returns its value
getNodeValue
protected static String getNodeValue(Node node)
- Returns the text value of the specified Node and child nodes
Copyright © 2012 The Apache Software Foundation