Lucene.Net 1.9.1 Class Library

Lucene.Net.Analysis Namespace

Namespace hierarchy

Classes

Class Description
Analyzer  
CharTokenizer An abstract base class for simple, character-oriented tokenizers.
ISOLatin1AccentFilter  
KeywordAnalyzer "Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
KeywordTokenizer Emits the entire input as a single token.
LengthFilter Removes words that are too long and too short from the stream.
LetterTokenizer A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
LowerCaseFilter Normalizes token text to lower case.
LowerCaseTokenizer  
PerFieldAnalyzerWrapper  
PorterStemFilter  
SimpleAnalyzer An Analyzer that filters LetterTokenizer with LowerCaseFilter.
StopAnalyzer Filters LetterTokenizer with LowerCaseFilter and StopFilter.
StopFilter Removes stop words from a token stream.
Token A Token is an occurence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".
TokenFilter  
Tokenizer  
TokenStream  
WhitespaceAnalyzer An Analyzer that uses WhitespaceTokenizer.
WhitespaceTokenizer A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.
WordlistLoader Loader for text files that represent a list of stopwords.