org.apache.nutch.parse.headings
Class HeadingsParseFilter

java.lang.Object
  extended by org.apache.nutch.parse.headings.HeadingsParseFilter
All Implemented Interfaces:
Configurable, HtmlParseFilter, Pluggable

public class HeadingsParseFilter
extends Object
implements HtmlParseFilter

HtmlParseFilter to retrieve h1 and h2 values from the DOM.


Field Summary
 
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
X_POINT_ID
 
Constructor Summary
HeadingsParseFilter()
           
 
Method Summary
 ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
 Configuration getConf()
           
protected  String getElement(String element)
          Finds the specified element and returns its value
protected static String getNodeValue(Node node)
          Returns the text value of the specified Node and child nodes
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HeadingsParseFilter

public HeadingsParseFilter()
Method Detail

filter

public ParseResult filter(Content content,
                          ParseResult parseResult,
                          HTMLMetaTags metaTags,
                          DocumentFragment doc)
Description copied from interface: HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.

Specified by:
filter in interface HtmlParseFilter

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable

getElement

protected String getElement(String element)
Finds the specified element and returns its value


getNodeValue

protected static String getNodeValue(Node node)
Returns the text value of the specified Node and child nodes



Copyright © 2012 The Apache Software Foundation