org.apache.nutch.indexer
Interface IndexingFilter
- All Superinterfaces:
- Configurable, Pluggable
- All Known Implementing Classes:
- AnchorIndexingFilter, BasicIndexingFilter, CCIndexingFilter, FeedIndexingFilter, LanguageIndexingFilter, MetadataIndexer, MoreIndexingFilter, RelTagIndexingFilter, StaticFieldIndexer, SubcollectionIndexingFilter, TLDIndexingFilter, URLMetaIndexingFilter
public interface IndexingFilter
- extends Pluggable, Configurable
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
X_POINT_ID
static final String X_POINT_ID
- The name of the extension point.
filter
NutchDocument filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
throws IndexingException
- Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value.
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the pageinlinks
- page inlinks
- Returns:
- modified (or a new) document instance, or null (meaning the document
should be discarded)
- Throws:
IndexingException
Copyright © 2012 The Apache Software Foundation