org.apache.nutch.indexer.field
Interface FieldFilter

All Superinterfaces:
Configurable, Pluggable

public interface FieldFilter
extends Pluggable, Configurable

Filter to manipulate FieldWritable objects for a given url during indexing. Field filters are responsible for converting FieldWritable objects into lucene fields and adding those fields to the Lucene document.


Field Summary
static String X_POINT_ID
           
 
Method Summary
 Document filter(String url, Document doc, List<FieldWritable> fields)
          Returns the document to which fields are being added or null if we are to stop processing for this url and not add anything to the index.
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
Method Detail

filter

Document filter(String url,
                Document doc,
                List<FieldWritable> fields)
                throws IndexingException
Returns the document to which fields are being added or null if we are to stop processing for this url and not add anything to the index. All FieldWritable objects for a url are aggregated from databases passed into the FieldIndexer and these fields are then passed into the Field filters. It is therefore possible for fields to be added, removed, and changed before being indexed.

Parameters:
url - The url to index.
doc - The lucene document
fields - The list of FieldWritable objects representing fields for the index.
Returns:
The lucene Document or null to stop processing and not index any content for this url.
Throws:
IndexingException - If an error occurs during indexing


Copyright © 2006 The Apache Software Foundation