org.apache.nutch.analysis
Class CommonGrams

java.lang.Object
  extended by org.apache.nutch.analysis.CommonGrams

public class CommonGrams
extends Object

Construct n-grams for frequently occurring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid. This is achieved through the use of Token.setPositionIncrement(int).


Constructor Summary
CommonGrams(Configuration conf)
          The constructor.
 
Method Summary
 TokenFilter getFilter(TokenStream ts, String field)
          Construct a token filter that inserts n-grams for common terms.
static void main(String[] args)
          For debugging.
 String[] optimizePhrase(Query.Phrase phrase, String field)
          Optimizes phrase queries to use n-grams when possible.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CommonGrams

public CommonGrams(Configuration conf)
The constructor.

Parameters:
conf -
Method Detail

getFilter

public TokenFilter getFilter(TokenStream ts,
                             String field)
Construct a token filter that inserts n-grams for common terms. For use while indexing documents.


optimizePhrase

public String[] optimizePhrase(Query.Phrase phrase,
                               String field)
Optimizes phrase queries to use n-grams when possible.


main

public static void main(String[] args)
                 throws Exception
For debugging.

Throws:
Exception


Copyright © 2006 The Apache Software Foundation