org.apache.nutch.microformats.reltag
Class RelTagParser

java.lang.Object
  extended by org.apache.nutch.microformats.reltag.RelTagParser
All Implemented Interfaces:
Configurable, ParseFilter, FieldPluggable, Pluggable

public class RelTagParser
extends Object
implements ParseFilter

Adds microformat rel-tags of document if found.

Author:
Jérôme Charron
See Also:
http://www.microformats.org/wiki/rel-tag

Field Summary
static org.slf4j.Logger LOG
           
static String REL_TAG
           
 
Fields inherited from interface org.apache.nutch.parse.ParseFilter
X_POINT_ID
 
Constructor Summary
RelTagParser()
           
 
Method Summary
 Parse filter(String url, WebPage page, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse, given the DOM tree of a page.
 Configuration getConf()
          Get the Configuration object
 Collection<WebPage.Field> getFields()
          Gets all the fields for a given WebPage Many datastores need to setup the mapreduce job by specifying the fields needed.
 void setConf(Configuration conf)
          Set the Configuration object
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.slf4j.Logger LOG

REL_TAG

public static final String REL_TAG
See Also:
Constant Field Values
Constructor Detail

RelTagParser

public RelTagParser()
Method Detail

setConf

public void setConf(Configuration conf)
Set the Configuration object

Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Get the Configuration object

Specified by:
getConf in interface Configurable

getFields

public Collection<WebPage.Field> getFields()
Gets all the fields for a given WebPage Many datastores need to setup the mapreduce job by specifying the fields needed. All extensions that work on WebPage are able to specify what fields they need.

Specified by:
getFields in interface FieldPluggable

filter

public Parse filter(String url,
                    WebPage page,
                    Parse parse,
                    HTMLMetaTags metaTags,
                    DocumentFragment doc)
Description copied from interface: ParseFilter
Adds metadata or otherwise modifies a parse, given the DOM tree of a page.

Specified by:
filter in interface ParseFilter


Copyright © 2012 The Apache Software Foundation