Package org.apache.nutch.analysis.lang

Text document language identifier.

See:
          Description

Class Summary
HTMLLanguageParser Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
LanguageIdentifier Identify the language of a content, based on statistical analysis.
LanguageIndexingFilter An IndexingFilter that add a lang (language) field to the document.
LanguageQueryFilter Handles "lang:" query clauses, causing them to search the "lang" field indexed by LanguageIdentifier.
NGramProfile This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction.
 

Package org.apache.nutch.analysis.lang Description

Text document language identifier.

Language profiles are based on material from http://www.isi.edu/~koehn/europarl/.



Copyright © 2006 The Apache Software Foundation