Lucene.Net 1.4.3 Class Library

Lucene.Net.Analysis Namespace

Namespace hierarchy

Classes

Class Description
Analyzer An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.

Typical implementations first build a Tokenizer, which breaks the stream of characters from the Reader into raw Tokens. One or more TokenFilters may then be applied to the output of the Tokenizer.

WARNING: You must override one of the methods defined by this class in your subclass or the Analyzer will enter an infinite loop.

CharTokenizer An abstract base class for simple, character-oriented tokenizers.
LetterTokenizer A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
LowerCaseFilter Normalizes token text to lower case.
LowerCaseTokenizer  
PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques. Use {@link #addAnalyzer} to add a non-default analyzer on a Field name basis. See TestPerFieldAnalyzerWrapper.java for example usage.
PorterStemFilter  
SimpleAnalyzer An Analyzer that filters LetterTokenizer with LowerCaseFilter.
StopAnalyzer Filters LetterTokenizer with LowerCaseFilter and StopFilter.
StopFilter Removes stop words from a token stream.
Token A Token is an occurence of a term from the text of a Field. It consists of a term's text, the start and end offset of the term in the text of the Field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".
TokenFilter  
Tokenizer  
TokenStream  
WhitespaceAnalyzer An Analyzer that uses WhitespaceTokenizer.
WhitespaceTokenizer A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.