Lucene.Net.Analysis

Classes

Class	Description
Analyzer	An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text. Typical implementations first build a Tokenizer, which breaks the stream of characters from the Reader into raw Tokens. One or more TokenFilters may then be applied to the output of the Tokenizer. WARNING: You must override one of the methods defined by this class in your subclass or the Analyzer will enter an infinite loop.
CharTokenizer	An abstract base class for simple, character-oriented tokenizers.
LetterTokenizer	A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
LowerCaseFilter	Normalizes token text to lower case.
LowerCaseTokenizer
PerFieldAnalyzerWrapper	This analyzer is used to facilitate scenarios where different fields require different analysis techniques. Use {@link #addAnalyzer} to add a non-default analyzer on a Field name basis. See TestPerFieldAnalyzerWrapper.java for example usage.
PorterStemFilter
SimpleAnalyzer	An Analyzer that filters LetterTokenizer with LowerCaseFilter.
StopAnalyzer	Filters LetterTokenizer with LowerCaseFilter and StopFilter.
StopFilter	Removes stop words from a token stream.
Token	A Token is an occurence of a term from the text of a Field. It consists of a term's text, the start and end offset of the term in the text of the Field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".
TokenFilter
Tokenizer
TokenStream
WhitespaceAnalyzer	An Analyzer that uses WhitespaceTokenizer.
WhitespaceTokenizer	A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.

Lucene.Net.Analysis Namespace

Classes