org.apache.nutch.indexer
Interface IndexingFilter

All Superinterfaces:
Configurable, Pluggable
All Known Implementing Classes:
BasicIndexingFilter, CCIndexingFilter, LanguageIndexingFilter, MoreIndexingFilter, RelTagIndexingFilter

public interface IndexingFilter
extends Pluggable, Configurable

Extension point for indexing. Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse.


Field Summary
static String X_POINT_ID
          The name of the extension point.
 
Method Summary
 void addIndexBackendOptions(Configuration conf)
          Adds index-level configuraition options.
 NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
The name of the extension point.

Method Detail

filter

NutchDocument filter(NutchDocument doc,
                     Parse parse,
                     Text url,
                     CrawlDatum datum,
                     Inlinks inlinks)
                     throws IndexingException
Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.

Parameters:
doc - document instance for collecting fields
parse - parse data instance
url - page url
datum - crawl datum for the page
inlinks - page inlinks
Returns:
modified (or a new) document instance, or null (meaning the document should be discarded)
Throws:
IndexingException

addIndexBackendOptions

void addIndexBackendOptions(Configuration conf)
Adds index-level configuraition options. Implementations can update given configuration to pass document-independent information to indexing backends. As a rule of thumb, prefix meta keys with the name of the backend intended. For example, when passing information to lucene backend, prefix keys with "lucene.".

Parameters:
conf - Configuration instance.


Copyright © 2006 The Apache Software Foundation