A grammar-based tokenizer constructed with JFlex
This should be a good tokenizer for most European-language documents:
Namespace: Lucene.Net.Analysis.Standard- Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
- Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
- Recognizes email addresses and internet hostnames as one token.
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1608
Assembly: Lucene.Net (in Lucene.Net.dll) Version: 2.9.4.1
Syntax
Inheritance Hierarchy
System..::..Object
Lucene.Net.Util..::..AttributeSource
Lucene.Net.Analysis..::..TokenStream
Lucene.Net.Analysis..::..Tokenizer
Lucene.Net.Analysis.Standard..::..StandardTokenizer
Lucene.Net.Util..::..AttributeSource
Lucene.Net.Analysis..::..TokenStream
Lucene.Net.Analysis..::..Tokenizer
Lucene.Net.Analysis.Standard..::..StandardTokenizer