org.apache.nutch.microformats.reltag
Class RelTagIndexingFilter

java.lang.Object
  extended by org.apache.nutch.microformats.reltag.RelTagIndexingFilter
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, IndexingFilter, FieldPluggable, Pluggable

public class RelTagIndexingFilter
extends Object
implements IndexingFilter

An IndexingFilter that adds tag field(s) to the document.

Author:
Jérôme Charron
See Also:
http://www.microformats.org/wiki/rel-tag

Field Summary
 
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
 
Constructor Summary
RelTagIndexingFilter()
           
 
Method Summary
 NutchDocument filter(NutchDocument doc, String url, WebPage page)
          The RelTagIndexingFilter filter object.
 org.apache.hadoop.conf.Configuration getConf()
          Get the Configuration object
 Collection<WebPage.Field> getFields()
          Gets all the fields for a given WebPage Many datastores need to setup the mapreduce job by specifying the fields needed.
 void setConf(org.apache.hadoop.conf.Configuration conf)
          Set the Configuration object
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RelTagIndexingFilter

public RelTagIndexingFilter()
Method Detail

getFields

public Collection<WebPage.Field> getFields()
Gets all the fields for a given WebPage Many datastores need to setup the mapreduce job by specifying the fields needed. All extensions that work on WebPage are able to specify what fields they need.

Specified by:
getFields in interface FieldPluggable

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Set the Configuration object

Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

getConf

public org.apache.hadoop.conf.Configuration getConf()
Get the Configuration object

Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

filter

public NutchDocument filter(NutchDocument doc,
                            String url,
                            WebPage page)
                     throws IndexingException
The RelTagIndexingFilter filter object.

Specified by:
filter in interface IndexingFilter
Parameters:
doc - The NutchDocument object
url - URL to be filtered for rel-tag's
page - WebPage object relative to the URL
Returns:
filtered NutchDocument
Throws:
IndexingException


Copyright © 2013 The Apache Software Foundation