org.apache.nutch.analysis
Class NutchDocumentAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.nutch.analysis.NutchAnalyzer
org.apache.nutch.analysis.NutchDocumentAnalyzer
- All Implemented Interfaces:
- Closeable, Configurable, Pluggable
public class NutchDocumentAnalyzer
- extends NutchAnalyzer
The analyzer used for Nutch documents. Uses the JavaCC-defined lexical
analyzer NutchDocumentTokenizer
, with no stop list. This keeps it
consistent with query parsing.
Field Summary |
static int |
INTER_ANCHOR_GAP
The number of unused term positions between anchors in the anchor field. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INTER_ANCHOR_GAP
public static final int INTER_ANCHOR_GAP
- The number of unused term positions between anchors in the anchor field.
- See Also:
- Constant Field Values
NutchDocumentAnalyzer
public NutchDocumentAnalyzer(Configuration conf)
- Parameters:
conf
-
tokenStream
public TokenStream tokenStream(String fieldName,
Reader reader)
- Returns a new token stream for text from the named field.
- Specified by:
tokenStream
in class NutchAnalyzer
Copyright © 2006 The Apache Software Foundation