[Missing <summary> documentation for "N:Lucene.Net.Analysis.Standard"]
Classes
| Class | Description |
---|
| StandardAnalyzer | Filters {@link StandardTokenizer} with {@link StandardFilter},
{@link LowerCaseFilter} and {@link StopFilter}, using a list of English stop
words.
You must specify the required {@link Version} compatibility when creating
StandardAnalyzer:
- As of 2.9, StopFilter preserves position increments
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see
LUCENE-1608
|
| StandardFilter | Normalizes tokens extracted with {@link StandardTokenizer}. |
| StandardTokenizer | A grammar-based tokenizer constructed with JFlex
This should be a good tokenizer for most European-language documents:
- Splits words at punctuation characters, removing punctuation. However, a
dot that's not followed by whitespace is considered part of a token.
- Splits words at hyphens, unless there's a number in the token, in which case
the whole token is interpreted as a product number and is not split.
- Recognizes email addresses and internet hostnames as one token.
Many applications have specific tokenizer needs. If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.
You must specify the required {@link Version} compatibility when creating
StandardAnalyzer:
- As of 2.4, Tokens incorrectly identified as acronyms are corrected (see
LUCENE-1608
|