A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method,
which doesn't know how to detect letters in encodings like CP1252 and KOI8
(well-known problems with 0xD7 and 0xF7 chars)
Namespace: Lucene.Net.Analysis.RuAssembly: Lucene.Net.Contrib.Analyzers (in Lucene.Net.Contrib.Analyzers.dll) Version: 2.9.2.1 (2.9.2.1)
Syntax
C# |
---|
public class RussianLetterTokenizer : CharTokenizer |
Visual Basic |
---|
Public Class RussianLetterTokenizer _ Inherits CharTokenizer |
Visual C++ |
---|
public ref class RussianLetterTokenizer : public CharTokenizer |
Inheritance Hierarchy
System..::..Object
Lucene.Net.Util..::..AttributeSource
Lucene.Net.Analysis..::..TokenStream
Lucene.Net.Analysis..::..Tokenizer
Lucene.Net.Analysis..::..CharTokenizer
Lucene.Net.Analysis.Ru..::..RussianLetterTokenizer
Lucene.Net.Util..::..AttributeSource
Lucene.Net.Analysis..::..TokenStream
Lucene.Net.Analysis..::..Tokenizer
Lucene.Net.Analysis..::..CharTokenizer
Lucene.Net.Analysis.Ru..::..RussianLetterTokenizer