org.apache.nutch.analysis.lang
Class LanguageIndexingFilter

java.lang.Object
  extended by org.apache.nutch.analysis.lang.LanguageIndexingFilter
All Implemented Interfaces:
Configurable, IndexingFilter, Pluggable

public class LanguageIndexingFilter
extends Object
implements IndexingFilter

An IndexingFilter that add a lang (language) field to the document. It tries to find the language of the document by:

Author:
Sami Siren, Jerome Charron

Field Summary
 
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
 
Constructor Summary
LanguageIndexingFilter()
          Constructs a new Language Indexing Filter.
 
Method Summary
 void addIndexBackendOptions(Configuration conf)
          Adds index-level configuraition options.
 NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
          Adds fields or otherwise modifies the document that will be indexed for a parse.
 Configuration getConf()
           
 void setConf(Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LanguageIndexingFilter

public LanguageIndexingFilter()
Constructs a new Language Indexing Filter.

Method Detail

filter

public NutchDocument filter(NutchDocument doc,
                            Parse parse,
                            Text url,
                            CrawlDatum datum,
                            Inlinks inlinks)
                     throws IndexingException
Description copied from interface: IndexingFilter
Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.

Specified by:
filter in interface IndexingFilter
Parameters:
doc - document instance for collecting fields
parse - parse data instance
url - page url
datum - crawl datum for the page
inlinks - page inlinks
Returns:
modified (or a new) document instance, or null (meaning the document should be discarded)
Throws:
IndexingException

addIndexBackendOptions

public void addIndexBackendOptions(Configuration conf)
Description copied from interface: IndexingFilter
Adds index-level configuraition options. Implementations can update given configuration to pass document-independent information to indexing backends. As a rule of thumb, prefix meta keys with the name of the backend intended. For example, when passing information to lucene backend, prefix keys with "lucene.".

Specified by:
addIndexBackendOptions in interface IndexingFilter
Parameters:
conf - Configuration instance.

setConf

public void setConf(Configuration conf)
Specified by:
setConf in interface Configurable

getConf

public Configuration getConf()
Specified by:
getConf in interface Configurable


Copyright © 2006 The Apache Software Foundation