Lucene.Net 1.9.1 Class Library |
|
Lucene.Net.Analysis Namespace
Namespace hierarchy
Classes
Class |
Description |
Analyzer
|
|
CharTokenizer
|
An abstract base class for simple, character-oriented tokenizers. |
ISOLatin1AccentFilter
|
|
KeywordAnalyzer
|
"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names. |
KeywordTokenizer
|
Emits the entire input as a single token. |
LengthFilter
|
Removes words that are too long and too short from the stream. |
LetterTokenizer
|
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces. |
LowerCaseFilter
|
Normalizes token text to lower case. |
LowerCaseTokenizer
|
|
PerFieldAnalyzerWrapper
|
|
PorterStemFilter
|
|
SimpleAnalyzer
|
An Analyzer that filters LetterTokenizer with LowerCaseFilter. |
StopAnalyzer
|
Filters LetterTokenizer with LowerCaseFilter and StopFilter. |
StopFilter
|
Removes stop words from a token stream. |
Token
|
A Token is an occurence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word". |
TokenFilter
|
|
Tokenizer
|
|
TokenStream
|
|
WhitespaceAnalyzer
|
An Analyzer that uses WhitespaceTokenizer. |
WhitespaceTokenizer
|
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens. |
WordlistLoader
|
Loader for text files that represent a list of stopwords. |