org.apache.nutch.analysis
Class NutchDocumentAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.nutch.analysis.NutchAnalyzer
          extended by org.apache.nutch.analysis.NutchDocumentAnalyzer
All Implemented Interfaces:
Closeable, Configurable, Pluggable

public class NutchDocumentAnalyzer
extends NutchAnalyzer

The analyzer used for Nutch documents. Uses the JavaCC-defined lexical analyzer NutchDocumentTokenizer, with no stop list. This keeps it consistent with query parsing.


Field Summary
static int INTER_ANCHOR_GAP
          The number of unused term positions between anchors in the anchor field.
 
Fields inherited from class org.apache.nutch.analysis.NutchAnalyzer
conf
 
Fields inherited from class org.apache.lucene.analysis.Analyzer
overridesTokenStreamMethod
 
Constructor Summary
NutchDocumentAnalyzer(Configuration conf)
           
 
Method Summary
 TokenStream tokenStream(String fieldName, Reader reader)
          Returns a new token stream for text from the named field.
 
Methods inherited from class org.apache.nutch.analysis.NutchAnalyzer
getConf, setConf
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INTER_ANCHOR_GAP

public static final int INTER_ANCHOR_GAP
The number of unused term positions between anchors in the anchor field.

See Also:
Constant Field Values
Constructor Detail

NutchDocumentAnalyzer

public NutchDocumentAnalyzer(Configuration conf)
Parameters:
conf -
Method Detail

tokenStream

public TokenStream tokenStream(String fieldName,
                               Reader reader)
Returns a new token stream for text from the named field.

Specified by:
tokenStream in class NutchAnalyzer


Copyright © 2006 The Apache Software Foundation