Lucene.Net 1.4.3 Class Library |
|
Lucene.Net.Analysis.RU Namespace
Namespace hierarchy
Classes
Class |
Description |
RussianAnalyzer
|
Analyzer for Russian language. Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified. |
RussianCharsets
|
RussianCharsets class contains encodings schemes (charsets) and toLowerCase() method implementation for russian characters in Unicode, KOI8 and CP1252. Each encoding scheme contains lowercase (positions 0-31) and uppercase (position 32-63) characters. One should be able to add other encoding schemes (like ISO-8859-5 or customized) by adding a new charset and adding logic to toLowerCase() method for that charset. |
RussianLetterTokenizer
|
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars) |
RussianLowerCaseFilter
|
Normalizes token text to lower case, analyzing given ("russian") charset. |
RussianStemFilter
|
A filter that stems Russian words. The implementation was inspired by GermanStemFilter. The input should be filtered by RussianLowerCaseFilter before passing it to RussianStemFilter , because RussianStemFilter only works with lowercase part of any "russian" charset. |
RussianStemmer
|
Russian stemming algorithm implementation (see http://snowball.sourceforge.net for detailed description). |