org.apache.nutch.indexer
Interface IndexingFilter
- All Superinterfaces:
- org.apache.hadoop.conf.Configurable, FieldPluggable, Pluggable
- All Known Implementing Classes:
- AnchorIndexingFilter, BasicIndexingFilter, CCIndexingFilter, FeedIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter, SubcollectionIndexingFilter, TLDIndexingFilter
public interface IndexingFilter
- extends FieldPluggable, org.apache.hadoop.conf.Configurable
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
NutchDocument filter(NutchDocument doc,
String url,
WebPage page)
throws IndexingException
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Parameters:
doc
- document instance for collecting fieldsurl
- page urlpage
-
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
Copyright © 2013 The Apache Software Foundation