Lucene.Net.Analysis

Classes

Class	Description
Analyzer
CharTokenizer	An abstract base class for simple, character-oriented tokenizers.
ISOLatin1AccentFilter
KeywordAnalyzer	"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
KeywordTokenizer	Emits the entire input as a single token.
LengthFilter	Removes words that are too long and too short from the stream.
LetterTokenizer	A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
LowerCaseFilter	Normalizes token text to lower case.
LowerCaseTokenizer
PerFieldAnalyzerWrapper
PorterStemFilter
SimpleAnalyzer	An Analyzer that filters LetterTokenizer with LowerCaseFilter.
StopAnalyzer	Filters LetterTokenizer with LowerCaseFilter and StopFilter.
StopFilter	Removes stop words from a token stream.
Token	A Token is an occurence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".
TokenFilter
Tokenizer
TokenStream
WhitespaceAnalyzer	An Analyzer that uses WhitespaceTokenizer.
WhitespaceTokenizer	A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.
WordlistLoader	Loader for text files that represent a list of stopwords.

Lucene.Net.Analysis Namespace

Classes