org.apache.nutch.indexer.subcollection
Class SubcollectionIndexingFilter
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter
- All Implemented Interfaces:
- Configurable, IndexingFilter, Pluggable
public class SubcollectionIndexingFilter
- extends Configured
- implements IndexingFilter
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
fieldName
public static String fieldName
- Doc field name
LOG
public static final org.slf4j.Logger LOG
- Logger
SubcollectionIndexingFilter
public SubcollectionIndexingFilter()
SubcollectionIndexingFilter
public SubcollectionIndexingFilter(Configuration conf)
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interface Configurable
- Overrides:
setConf
in class Configured
- Parameters:
Configuration
- conf
getConf
public Configuration getConf()
- Specified by:
getConf
in interface Configurable
- Overrides:
getConf
in class Configured
- Returns:
- Configuration
filter
public NutchDocument filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Description copied from interface:
IndexingFilter
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Specified by:
filter
in interface IndexingFilter
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinks
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
Copyright © 2012 The Apache Software Foundation