org.apache.nutch.analysis
Class NutchAnalysis

java.lang.Object
  extended by org.apache.nutch.analysis.NutchAnalysis
All Implemented Interfaces:
NutchAnalysisConstants

public class NutchAnalysis
extends Object
implements NutchAnalysisConstants

The JavaCC-generated Nutch lexical analyzer and query parser.


Field Summary
 org.apache.nutch.analysis.Token jj_nt
          Next token.
 org.apache.nutch.analysis.Token token
          Current token.
 NutchAnalysisTokenManager token_source
          Generated Token Manager.
 
Fields inherited from interface org.apache.nutch.analysis.NutchAnalysisConstants
ACRONYM, APOSTROPHE, ATSIGN, C_PLUS_PLUS, C_SHARP, CJK, COLON, DEFAULT, DIGIT, DOT, EOF, IRREGULAR_WORD, LETTER, MINUS, PLUS, QUOTE, SIGRAM, SLASH, tokenImage, WHITE, WORD, WORD_PUNCT
 
Constructor Summary
NutchAnalysis(org.apache.nutch.analysis.CharStream stream)
          Constructor with user supplied CharStream.
NutchAnalysis(NutchAnalysisTokenManager tm)
          Constructor with generated Token Manager.
NutchAnalysis(String query, Analyzer analyzer)
          Constructs a nutch analysis.
 
Method Summary
 ArrayList compound(String field)
          Parse a compound term that is interpreted as an implicit phrase query.
 void disable_tracing()
          Disable tracing.
 void enable_tracing()
          Enable tracing.
 ParseException generateParseException()
          Generate ParseException.
 org.apache.nutch.analysis.Token getNextToken()
          Get the next Token.
 org.apache.nutch.analysis.Token getToken(int index)
          Get the specific Token.
 void infix()
          Characters which can be used to form compound terms.
static boolean isStopWord(String word)
          True iff word is a stop word.
static void main(String[] args)
          For debugging.
 void nonOpInfix()
          Parse infix characters except plus and minus.
 void nonOpOrTerm()
          Parse anything but a term or an operator (plur or minus or quote).
 void nonTerm()
          Parse anything but a term or a quote.
 void nonTermOrEOF()
           
 Query parse(Configuration conf)
          Parse a query.
static Query parseQuery(String queryString, Analyzer analyzer, Configuration conf)
          Construct a query parser for the text in a reader.
static Query parseQuery(String queryString, Configuration conf)
          Construct a query parser for the text in a reader.
 ArrayList phrase(String field)
          Parse an explcitly quoted phrase query.
 void ReInit(org.apache.nutch.analysis.CharStream stream)
          Reinitialise.
 void ReInit(NutchAnalysisTokenManager tm)
          Reinitialise.
 String term()
          Parse a single term.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

token_source

public NutchAnalysisTokenManager token_source
Generated Token Manager.


token

public org.apache.nutch.analysis.Token token
Current token.


jj_nt

public org.apache.nutch.analysis.Token jj_nt
Next token.

Constructor Detail

NutchAnalysis

public NutchAnalysis(String query,
                     Analyzer analyzer)
Constructs a nutch analysis.


NutchAnalysis

public NutchAnalysis(org.apache.nutch.analysis.CharStream stream)
Constructor with user supplied CharStream.


NutchAnalysis

public NutchAnalysis(NutchAnalysisTokenManager tm)
Constructor with generated Token Manager.

Method Detail

isStopWord

public static boolean isStopWord(String word)
True iff word is a stop word. Stop words are only removed from queries. Every word is indexed.


parseQuery

public static Query parseQuery(String queryString,
                               Configuration conf)
                        throws IOException
Construct a query parser for the text in a reader.

Throws:
IOException

parseQuery

public static Query parseQuery(String queryString,
                               Analyzer analyzer,
                               Configuration conf)
                        throws IOException
Construct a query parser for the text in a reader.

Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
For debugging.

Throws:
Exception

parse

public final Query parse(Configuration conf)
                  throws ParseException
Parse a query.

Throws:
ParseException

phrase

public final ArrayList phrase(String field)
                       throws ParseException
Parse an explcitly quoted phrase query. Note that this may return a single term, a trivial phrase.

Throws:
ParseException

compound

public final ArrayList compound(String field)
                         throws ParseException
Parse a compound term that is interpreted as an implicit phrase query. Compounds are a sequence of terms separated by infix characters. Note that this may return a single term, a trivial compound.

Throws:
ParseException

term

public final String term()
                  throws ParseException
Parse a single term.

Throws:
ParseException

nonTerm

public final void nonTerm()
                   throws ParseException
Parse anything but a term or a quote.

Throws:
ParseException

nonTermOrEOF

public final void nonTermOrEOF()
                        throws ParseException
Throws:
ParseException

nonOpOrTerm

public final void nonOpOrTerm()
                       throws ParseException
Parse anything but a term or an operator (plur or minus or quote).

Throws:
ParseException

infix

public final void infix()
                 throws ParseException
Characters which can be used to form compound terms.

Throws:
ParseException

nonOpInfix

public final void nonOpInfix()
                      throws ParseException
Parse infix characters except plus and minus.

Throws:
ParseException

ReInit

public void ReInit(org.apache.nutch.analysis.CharStream stream)
Reinitialise.


ReInit

public void ReInit(NutchAnalysisTokenManager tm)
Reinitialise.


getNextToken

public final org.apache.nutch.analysis.Token getNextToken()
Get the next Token.


getToken

public final org.apache.nutch.analysis.Token getToken(int index)
Get the specific Token.


generateParseException

public ParseException generateParseException()
Generate ParseException.


enable_tracing

public final void enable_tracing()
Enable tracing.


disable_tracing

public final void disable_tracing()
Disable tracing.



Copyright © 2006 The Apache Software Foundation