All Classes and Interfaces (Lucene 9.9.1 common API)

This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.

ASCIIFoldingFilterFactory

Factory for ASCIIFoldingFilter.

BaseCharFilter

Base utility class for implementing a CharFilter.

BasqueAnalyzer

Analyzer for Basque.

BasqueStemmer

This class implements the stemming algorithm defined by a snowball script.

BengaliAnalyzer

Analyzer for Bengali.

BengaliNormalizationFilter

A TokenFilter that applies BengaliNormalizer to normalize the orthography.

BengaliNormalizationFilterFactory

Factory for BengaliNormalizationFilter.

BengaliNormalizer

Normalizer for Bengali.

BengaliStemFilter

A TokenFilter that applies BengaliStemmer to stem Bengali words.

BengaliStemFilterFactory

Factory for BengaliStemFilter.

BengaliStemmer

Stemmer for Bengali.

BrazilianAnalyzer

Analyzer for Brazilian Portuguese language.

BrazilianStemFilter

A TokenFilter that applies BrazilianStemmer.

BrazilianStemFilterFactory

Factory for BrazilianStemFilter.

BrazilianStemmer

A stemmer for Brazilian Portuguese words.

BulgarianAnalyzer

Analyzer for Bulgarian.

BulgarianStemFilter

A TokenFilter that applies BulgarianStemmer to stem Bulgarian words.

BulgarianStemFilterFactory

Factory for BulgarianStemFilter.

BulgarianStemmer

Light Stemmer for Bulgarian.

ByteVector

This class implements a simple byte vector with access to the underlying array.

CapitalizationFilter

A filter to apply normal capitalization rules to Tokens.

CapitalizationFilterFactory

Factory for CapitalizationFilter.

CatalanAnalyzer

Analyzer for Catalan.

CatalanStemmer

This class implements the stemming algorithm defined by a snowball script.

CharArrayIterator

A CharacterIterator used internally for use with BreakIterator

CharTokenizer

An abstract base class for simple, character-oriented tokenizers.

CharVector

This class implements a simple char vector with access to the underlying array.

CJKAnalyzer

An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter

CJKBigramFilter

Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

CJKBigramFilterFactory

Factory for CJKBigramFilter.

CJKWidthCharFilter

A CharFilter that normalizes CJK width differences: Folds fullwidth ASCII variants into the equivalent basic latin Folds halfwidth Katakana variants into the equivalent kana

CJKWidthCharFilterFactory

Factory for CJKWidthCharFilter.

CJKWidthFilter

A TokenFilter that normalizes CJK width differences: Folds fullwidth ASCII variants into the equivalent basic latin Folds halfwidth Katakana variants into the equivalent kana

CJKWidthFilterFactory

Factory for CJKWidthFilter.

ClassicAnalyzer

Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

ClassicFilter

Normalizes tokens extracted with ClassicTokenizer.

ClassicFilterFactory

Factory for ClassicFilter.

ClassicTokenizer

A grammar-based tokenizer constructed with JFlex

ClassicTokenizerFactory

Factory for ClassicTokenizer.

CodepointCountFilter

Removes words that are too long or too short from the stream.

CodepointCountFilterFactory

Factory for CodepointCountFilter.

CollatedTermAttributeImpl

Extension of CharTermAttributeImpl that encodes the term text as a binary Unicode collation key instead of as UTF-8 bytes.

CollationAttributeFactory

Converts each token into its CollationKey, and then encodes the bytes as an index term.

CollationDocValuesField

Indexes collation keys as a single-valued SortedDocValuesField.

CollationKeyAnalyzer

Configures KeywordTokenizer with CollationAttributeFactory.

CommonGramsFilter

Construct bigrams for frequently occurring terms while indexing.

CommonGramsFilterFactory

Constructs a CommonGramsFilter.

CommonGramsQueryFilter

Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are not a member of a bigram.

CommonGramsQueryFilterFactory

Construct CommonGramsQueryFilter.

CompoundWordTokenFilterBase

Base class for decomposition token filters.

ConcatenateGraphFilter

Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph).

ConcatenateGraphFilter.BytesRefBuilderTermAttribute

Attribute providing access to the term builder and UTF-16 conversion

ConcatenateGraphFilter.BytesRefBuilderTermAttributeImpl

Implementation of ConcatenateGraphFilter.BytesRefBuilderTermAttribute

ConcatenateGraphFilterFactory

Factory for ConcatenateGraphFilter.

ConcatenatingTokenStream

A TokenStream that takes an array of input TokenStreams as sources, and concatenates them together.

ConditionalTokenFilter

Allows skipping TokenFilters based on the current set of attributes.

ConditionalTokenFilterFactory

Abstract parent class for analysis factories that create ConditionalTokenFilter instances

CSVUtil

Utility class for parsing CSV text

CustomAnalyzer

A general-purpose Analyzer that can be created with a builder-style API.

CustomAnalyzer.Builder

Builder for CustomAnalyzer.

CustomAnalyzer.ConditionBuilder

Factory class for a ConditionalTokenFilter

CzechAnalyzer

Analyzer for Czech language.

CzechStemFilter

A TokenFilter that applies CzechStemmer to stem Czech words.

CzechStemFilterFactory

Factory for CzechStemFilter.

CzechStemmer

Light Stemmer for Czech.

DanishAnalyzer

Analyzer for Danish.

DanishStemmer

This class implements the stemming algorithm defined by a snowball script.

DateRecognizerFilter

Filters all tokens that cannot be parsed to a date, using the provided DateFormat.

DateRecognizerFilterFactory

Factory for DateRecognizerFilter.

DecimalDigitFilter

Folds all Unicode digits in [:General_Category=Decimal_Number:] to Basic Latin digits (0-9).

DecimalDigitFilterFactory

Factory for DecimalDigitFilter.

DelimitedBoostTokenFilter

Characters before the delimiter are the "token", those after are the boost.

DelimitedBoostTokenFilterFactory

Factory for DelimitedBoostTokenFilter.

DelimitedPayloadTokenFilter

Characters before the delimiter are the "token", those after are the payload.

DelimitedPayloadTokenFilterFactory

Factory for DelimitedPayloadTokenFilter.

DelimitedTermFrequencyTokenFilter

Characters before the delimiter are the "token", the textual integer after is the term frequency.

DelimitedTermFrequencyTokenFilterFactory

Factory for DelimitedTermFrequencyTokenFilter.

DictEntries

An object representing homonym dictionary entries.

DictEntry

An object representing *.dic file entry with its word, flags and morphological data.

Dictionary

In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.

DictionaryCompoundWordTokenFilter

A TokenFilter that decomposes compound words found in many Germanic languages.

DictionaryCompoundWordTokenFilterFactory

Factory for DictionaryCompoundWordTokenFilter.

Dl4jModelReader

Dl4jModelReader reads the file generated by the library Deeplearning4j and provide a Word2VecModel with normalized vectors

DropIfFlaggedFilter

Allows Tokens with a given combination of flags to be dropped.

DropIfFlaggedFilterFactory

Provides a filter that will drop tokens matching a set of flags.

DutchAnalyzer

Analyzer for Dutch language.

DutchStemmer

This class implements the stemming algorithm defined by a snowball script.

EdgeNGramFilterFactory

Creates new instances of EdgeNGramTokenFilter.

EdgeNGramTokenFilter

Tokenizes the given token into n-grams of given size(s).

EdgeNGramTokenizer

Tokenizes the input from an edge into n-grams of given size(s).

EdgeNGramTokenizerFactory

Creates new instances of EdgeNGramTokenizer.

ElisionFilter

Removes elisions from a TokenStream.

ElisionFilterFactory

Factory for ElisionFilter.

EmptyTokenStream

An always exhausted token stream.

EnglishAnalyzer

Analyzer for English.

EnglishMinimalStemFilter

A TokenFilter that applies EnglishMinimalStemmer to stem English words.

EnglishMinimalStemFilterFactory

Factory for EnglishMinimalStemFilter.

EnglishMinimalStemmer

Minimal plural stemmer for English.

EnglishPossessiveFilter

TokenFilter that removes possessives (trailing 's) from words.

EnglishPossessiveFilterFactory

Factory for EnglishPossessiveFilter.

EnglishStemmer

This class implements the stemming algorithm defined by a snowball script.

EntrySuggestion

Suggestion to add/edit dictionary entries to generate a given list of words created by WordFormGenerator.compress(java.util.List<java.lang.String>, java.util.Set<java.lang.String>, java.lang.Runnable).

EstonianAnalyzer

Analyzer for Estonian.

EstonianStemmer

This class implements the stemming algorithm defined by a snowball script.

FilesystemResourceLoader

Simple ResourceLoader that opens resource files from the local file system, optionally resolving against a base directory.

FingerprintFilter

Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens.

FingerprintFilterFactory

Factory for FingerprintFilter.

FinnishAnalyzer

Analyzer for Finnish.

FinnishLightStemFilter

A TokenFilter that applies FinnishLightStemmer to stem Finnish words.

FinnishLightStemFilterFactory

Factory for FinnishLightStemFilter.

FinnishLightStemmer

Light Stemmer for Finnish.

FinnishStemmer

This class implements the stemming algorithm defined by a snowball script.

FixBrokenOffsetsFilter

Deprecated.

Fix the token filters that create broken offsets in the first place.

FixBrokenOffsetsFilterFactory

Deprecated.

FixedShingleFilter

A FixedShingleFilter constructs shingles (token n-grams) from a token stream.

FixedShingleFilterFactory

Factory for FixedShingleFilter

FlattenGraphFilter

Converts an incoming graph token stream, such as one from SynonymGraphFilter, into a flat form so that all nodes form a single linear chain with no side paths.

FlattenGraphFilterFactory

Factory for FlattenGraphFilter.

FloatEncoder

Encode a character array Float as a BytesRef.

FragmentChecker

An oracle for quickly checking that a specific part of a word can never be a valid word.

FrenchAnalyzer

Analyzer for French language.

FrenchLightStemFilter

A TokenFilter that applies FrenchLightStemmer to stem French words.

FrenchLightStemFilterFactory

Factory for FrenchLightStemFilter.

FrenchLightStemmer

Light Stemmer for French.

FrenchMinimalStemFilter

A TokenFilter that applies FrenchMinimalStemmer to stem French words.

FrenchMinimalStemFilterFactory

Factory for FrenchMinimalStemFilter.

FrenchMinimalStemmer

Light Stemmer for French.

FrenchStemmer

This class implements the stemming algorithm defined by a snowball script.

GalicianAnalyzer

Analyzer for Galician.

GalicianMinimalStemFilter

A TokenFilter that applies GalicianMinimalStemmer to stem Galician words.

GalicianMinimalStemFilterFactory

Factory for GalicianMinimalStemFilter.

GalicianMinimalStemmer

Minimal Stemmer for Galician

GalicianStemFilter

A TokenFilter that applies GalicianStemmer to stem Galician words.

GalicianStemFilterFactory

Factory for GalicianStemFilter.

GalicianStemmer

Galician stemmer implementing "Regras do lematizador para o galego".

German2Stemmer

This class implements the stemming algorithm defined by a snowball script.

GermanAnalyzer

Analyzer for German language.

GermanLightStemFilter

A TokenFilter that applies GermanLightStemmer to stem German words.

GermanLightStemFilterFactory

Factory for GermanLightStemFilter.

GermanLightStemmer

Light Stemmer for German.

GermanMinimalStemFilter

A TokenFilter that applies GermanMinimalStemmer to stem German words.

GermanMinimalStemFilterFactory

Factory for GermanMinimalStemFilter.

GermanMinimalStemmer

Minimal Stemmer for German.

GermanNormalizationFilter

Normalizes German characters according to the heuristics of the German2 snowball algorithm.

GermanNormalizationFilterFactory

Factory for GermanNormalizationFilter.

GermanStemFilter

A TokenFilter that stems German words.

GermanStemFilterFactory

Factory for GermanStemFilter.

GermanStemmer

A stemmer for German words.

GermanStemmer

This class implements the stemming algorithm defined by a snowball script.

GreekAnalyzer

Analyzer for the Greek language.

GreekLowerCaseFilter

Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma to sigma.

GreekLowerCaseFilterFactory

Factory for GreekLowerCaseFilter.

GreekStemFilter

A TokenFilter that applies GreekStemmer to stem Greek words.

GreekStemFilterFactory

Factory for GreekStemFilter.

GreekStemmer

A stemmer for Greek words, according to: Development of a Stemmer for the Greek Language. Georgios Ntais

GreekStemmer

This class implements the stemming algorithm defined by a snowball script.

HindiAnalyzer

Analyzer for Hindi.

HindiNormalizationFilter

A TokenFilter that applies HindiNormalizer to normalize the orthography.

HindiNormalizationFilterFactory

Factory for HindiNormalizationFilter.

HindiNormalizer

Normalizer for Hindi.

HindiStemFilter

A TokenFilter that applies HindiStemmer to stem Hindi words.

HindiStemFilterFactory

Factory for HindiStemFilter.

HindiStemmer

Light Stemmer for Hindi.

HindiStemmer

This class implements the stemming algorithm defined by a snowball script.

HTMLStripCharFilter

A CharFilter that wraps another Reader and attempts to strip out HTML constructs.

HTMLStripCharFilterFactory

Factory for HTMLStripCharFilter.

HungarianAnalyzer

Analyzer for Hungarian.

HungarianLightStemFilter

A TokenFilter that applies HungarianLightStemmer to stem Hungarian words.

HungarianLightStemFilterFactory

Factory for HungarianLightStemFilter.

HungarianLightStemmer

Light Stemmer for Hungarian.

HungarianStemmer

This class implements the stemming algorithm defined by a snowball script.

Hunspell

A spell checker based on Hunspell dictionaries.

HunspellStemFilter

TokenFilter that uses hunspell affix rules and words to stem tokens.

HunspellStemFilterFactory

TokenFilterFactory that creates instances of HunspellStemFilter.

Hyphen

This class represents a hyphen.

HyphenatedWordsFilter

When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.

HyphenatedWordsFilterFactory

Factory for HyphenatedWordsFilter.

Hyphenation

This class represents a hyphenated word.

HyphenationCompoundWordTokenFilter

A TokenFilter that decomposes compound words found in many Germanic languages.

HyphenationCompoundWordTokenFilterFactory

Factory for HyphenationCompoundWordTokenFilter.

HyphenationTree

This tree structure stores the hyphenation patterns in an efficient way for fast lookup.

IdentityEncoder

Does nothing other than convert the char array to a byte array using the specified encoding.

IndicNormalizationFilter

A TokenFilter that applies IndicNormalizer to normalize text in Indian Languages.

IndicNormalizationFilterFactory

Factory for IndicNormalizationFilter.

IndicNormalizer

Normalizes the Unicode representation of text in Indian languages.

IndonesianAnalyzer

Analyzer for Indonesian (Bahasa)

IndonesianStemFilter

A TokenFilter that applies IndonesianStemmer to stem Indonesian words.

IndonesianStemFilterFactory

Factory for IndonesianStemFilter.

IndonesianStemmer

Stemmer for Indonesian.

IndonesianStemmer

This class implements the stemming algorithm defined by a snowball script.

IntegerEncoder

Encode a character array Integer as a BytesRef.

IrishAnalyzer

Analyzer for Irish.

IrishLowerCaseFilter

Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')

IrishLowerCaseFilterFactory

Factory for IrishLowerCaseFilter.

IrishStemmer

This class implements the stemming algorithm defined by a snowball script.

ItalianAnalyzer

Analyzer for Italian.

ItalianLightStemFilter

A TokenFilter that applies ItalianLightStemmer to stem Italian words.

ItalianLightStemFilterFactory

Factory for ItalianLightStemFilter.

ItalianLightStemmer

Light Stemmer for Italian.

ItalianStemmer

This class implements the stemming algorithm defined by a snowball script.

KeepWordFilter

A TokenFilter that only keeps tokens with text contained in the required words.

KeepWordFilterFactory

Factory for KeepWordFilter.

KeywordAnalyzer

"Tokenizes" the entire stream as a single token.

KeywordMarkerFilter

Marks terms as keywords via the KeywordAttribute.

KeywordMarkerFilterFactory

Factory for KeywordMarkerFilter.

KeywordRepeatFilter

This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with KeywordAttribute.setKeyword(boolean) set to true and once set to false.

KeywordRepeatFilterFactory

Factory for KeywordRepeatFilter.

KeywordTokenizer

Emits the entire input as a single token.

KeywordTokenizerFactory

Factory for KeywordTokenizer.

KpStemmer

This class implements the stemming algorithm defined by a snowball script.

KStemFilter

A high-performance kstem filter for english.

KStemFilterFactory

Factory for KStemFilter.

KStemmer

This class implements the Kstem algorithm

LatvianAnalyzer

Analyzer for Latvian.

LatvianStemFilter

A TokenFilter that applies LatvianStemmer to stem Latvian words.

LatvianStemFilterFactory

Factory for LatvianStemFilter.

LatvianStemmer

Light stemmer for Latvian.

LengthFilter

Removes words that are too long or too short from the stream.

LengthFilterFactory

Factory for LengthFilter.

LetterTokenizer

A LetterTokenizer is a tokenizer that divides text at non-letters.

LetterTokenizerFactory

Factory for LetterTokenizer.

LimitTokenCountAnalyzer

This Analyzer limits the number of tokens while indexing.

LimitTokenCountFilter

This TokenFilter limits the number of tokens while indexing.

LimitTokenCountFilterFactory

Factory for LimitTokenCountFilter.

LimitTokenOffsetFilter

Lets all tokens pass through until it sees one with a start offset <= a configured limit, which won't pass and ends the stream.

LimitTokenOffsetFilterFactory

Factory for LimitTokenOffsetFilter.

LimitTokenPositionFilter

This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.

LimitTokenPositionFilterFactory

Factory for LimitTokenPositionFilter.

LithuanianAnalyzer

Analyzer for Lithuanian.

LithuanianStemmer

This class implements the stemming algorithm defined by a snowball script.

LovinsStemmer

This class implements the stemming algorithm defined by a snowball script.

LowerCaseFilter

Normalizes token text to lower case.

LowerCaseFilterFactory

Factory for LowerCaseFilter.

MappingCharFilter

Simplistic CharFilter that applies the mappings contained in a NormalizeCharMap to the character stream, and correcting the resulting changes to the offsets.

MappingCharFilterFactory

Factory for MappingCharFilter.

MinHashFilter

Generate min hash tokens from an incoming stream of tokens.

MinHashFilterFactory

TokenFilterFactory for MinHashFilter.

NepaliAnalyzer

Analyzer for Nepali.

NepaliStemmer

This class implements the stemming algorithm defined by a snowball script.

NGramFilterFactory

Factory for NGramTokenFilter.

NGramFragmentChecker

A FragmentChecker based on all character n-grams possible in a certain language, keeping them in a relatively memory-efficient, but probabilistic data structure.

NGramFragmentChecker.NGramConsumer

A callback for n-gram ranges in words

NGramTokenFilter

Tokenizes the input into n-grams of the given size(s).

NGramTokenizer

Tokenizes the input into n-grams of the given size(s).

NGramTokenizerFactory

Factory for NGramTokenizer.

NormalizeCharMap

Holds a map of String input to String output, to be used with MappingCharFilter.

NormalizeCharMap.Builder

Builds an NormalizeCharMap.

NorwegianAnalyzer

Analyzer for Norwegian.

NorwegianLightStemFilter

A TokenFilter that applies NorwegianLightStemmer to stem Norwegian words.

NorwegianLightStemFilterFactory

Factory for NorwegianLightStemFilter.

NorwegianLightStemmer

Light Stemmer for Norwegian.

NorwegianMinimalStemFilter

A TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian words.

NorwegianMinimalStemFilterFactory

Factory for NorwegianMinimalStemFilter.

NorwegianMinimalStemmer

Minimal Stemmer for Norwegian Bokmål (no-nb) and Nynorsk (no-nn)

NorwegianNormalizationFilter

This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (ae, oe, aa) by transforming them to åÅæÆøØ.

NorwegianNormalizationFilterFactory

Factory for NorwegianNormalizationFilter.

NorwegianStemmer

This class implements the stemming algorithm defined by a snowball script.

NumericPayloadTokenFilter

Assigns a payload to a token based on the TypeAttribute

NumericPayloadTokenFilterFactory

Factory for NumericPayloadTokenFilter.

OpenStringBuilder

A StringBuilder that allows one to access the array.

PathHierarchyTokenizer

Tokenizer for path-like hierarchies.

PathHierarchyTokenizerFactory

Factory for PathHierarchyTokenizer.

PatternCaptureGroupFilterFactory

Factory for PatternCaptureGroupTokenFilter.

PatternCaptureGroupTokenFilter

CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns.

PatternConsumer

This interface is used to connect the XML pattern file parser to the hyphenation tree.

PatternKeywordMarkerFilter

Marks terms as keywords via the KeywordAttribute.

PatternParser

A SAX document handler to read and parse hyphenation patterns from a XML file.

PatternReplaceCharFilter

CharFilter that uses a regular expression for the target of replace string.

PatternReplaceCharFilterFactory

Factory for PatternReplaceCharFilter.

PatternReplaceFilter

A TokenFilter which applies a Pattern to each token in the stream, replacing match occurrences with the specified replacement string.

PatternReplaceFilterFactory

Factory for PatternReplaceFilter.

PatternTokenizer

This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.

PatternTokenizerFactory

Factory for PatternTokenizer.

PatternTypingFilter

Set a type attribute to a parameterized value when tokens are matched by any of a several regex patterns.

PatternTypingFilter.PatternTypingRule

Value holding class for pattern typing rules.

PatternTypingFilterFactory

Provides a filter that will analyze tokens with the analyzer from an arbitrary field type.

PayloadEncoder

Mainly for use with the DelimitedPayloadTokenFilter, converts char buffers to BytesRef.

PayloadHelper

Utility methods for encoding payloads.

PerFieldAnalyzerWrapper

This analyzer is used to facilitate scenarios where different fields require different analysis techniques.

PersianAnalyzer

Analyzer for Persian.

PersianCharFilter

CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.

PersianCharFilterFactory

Factory for PersianCharFilter.

PersianNormalizationFilter

A TokenFilter that applies PersianNormalizer to normalize the orthography.

PersianNormalizationFilterFactory

Factory for PersianNormalizationFilter.

PersianNormalizer

Normalizer for Persian.

PersianStemFilter

A TokenFilter that applies PersianStemmer to stem Persian words.

PersianStemFilterFactory

Factory for PersianStemFilter.

PersianStemmer

Stemmer for Persian.

PorterStemFilter

Transforms the token stream as per the Porter stemming algorithm.

PorterStemFilterFactory

Factory for PorterStemFilter.

PorterStemmer

This class implements the stemming algorithm defined by a snowball script.

PortugueseAnalyzer

Analyzer for Portuguese.

PortugueseLightStemFilter

A TokenFilter that applies PortugueseLightStemmer to stem Portuguese words.

PortugueseLightStemFilterFactory

Factory for PortugueseLightStemFilter.

PortugueseLightStemmer

Light Stemmer for Portuguese

PortugueseMinimalStemFilter

A TokenFilter that applies PortugueseMinimalStemmer to stem Portuguese words.

PortugueseMinimalStemFilterFactory

Factory for PortugueseMinimalStemFilter.

PortugueseMinimalStemmer

Minimal Stemmer for Portuguese

PortugueseStemFilter

A TokenFilter that applies PortugueseStemmer to stem Portuguese words.

PortugueseStemFilterFactory

Factory for PortugueseStemFilter.

PortugueseStemmer

Portuguese stemmer implementing the RSLP (Removedor de Sufixos da Lingua Portuguesa) algorithm.

PortugueseStemmer

This class implements the stemming algorithm defined by a snowball script.

ProtectedTermFilter

A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained in a protected set.

ProtectedTermFilterFactory

Factory for a ProtectedTermFilter

QueryAutoStopWordAnalyzer

An Analyzer used primarily at query time to wrap another analyzer and provide a layer of protection which prevents very common words from being passed into queries.

RemoveDuplicatesTokenFilter

A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.

RemoveDuplicatesTokenFilterFactory

Factory for RemoveDuplicatesTokenFilter.

ReversePathHierarchyTokenizer

Tokenizer for domain-like hierarchies.

ReverseStringFilter

Reverse token string, for example "country" => "yrtnuoc".

ReverseStringFilterFactory

Factory for ReverseStringFilter.

RollingCharBuffer

Acts like a forever growing char[] as you read characters into it from the provided reader, but internally it uses a circular buffer to only hold the characters that haven't been freed yet.

RomanianAnalyzer

Analyzer for Romanian.

RomanianStemmer

This class implements the stemming algorithm defined by a snowball script.

RSLPStemmerBase

Base class for stemmers that use a set of RSLP-like stemming steps.

RSLPStemmerBase.Rule

A basic rule, with no exceptions.

RSLPStemmerBase.RuleWithSetExceptions

A rule with a set of whole-word exceptions.

RSLPStemmerBase.RuleWithSuffixExceptions

A rule with a set of exceptional suffixes.

RSLPStemmerBase.Step

A step containing a list of rules.

RussianAnalyzer

Analyzer for Russian language.

RussianLightStemFilter

A TokenFilter that applies RussianLightStemmer to stem Russian words.

RussianLightStemFilterFactory

Factory for RussianLightStemFilter.

RussianLightStemmer

Light Stemmer for Russian.

RussianStemmer

This class implements the stemming algorithm defined by a snowball script.

ScandinavianFoldingFilter

This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.

ScandinavianFoldingFilterFactory

Factory for ScandinavianFoldingFilter.

ScandinavianNormalizationFilter

This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.

ScandinavianNormalizationFilterFactory

Factory for ScandinavianNormalizationFilter.

ScandinavianNormalizer

This Normalizer does the heavy lifting for a set of Scandinavian normalization filters, normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.

ScandinavianNormalizer.Foldings

List of possible foldings that can be used when configuring the filter

SegmentingTokenizerBase

Breaks text into sentences with a BreakIterator and allows subclasses to decompose these sentences into words.

SerbianAnalyzer

Analyzer for Serbian.

SerbianNormalizationFilter

Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.

SerbianNormalizationFilterFactory

Factory for SerbianNormalizationFilter.

SerbianNormalizationRegularFilter

Normalizes Serbian Cyrillic to Latin.

SerbianStemmer

This class implements the stemming algorithm defined by a snowball script.

SetKeywordMarkerFilter

Marks terms as keywords via the KeywordAttribute.

ShingleAnalyzerWrapper

A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

ShingleFilter

A ShingleFilter constructs shingles (token n-grams) from a token stream.

ShingleFilterFactory

Factory for ShingleFilter.

SimpleAnalyzer

An Analyzer that filters LetterTokenizer with LowerCaseFilter

SimplePatternSplitTokenizer

This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton, to locate tokens.

SimplePatternSplitTokenizerFactory

Factory for SimplePatternSplitTokenizer, for producing tokens by splitting according to the provided regexp.

SimplePatternTokenizer

This tokenizer uses a Lucene RegExp or (expert usage) a pre-built determinized Automaton, to locate tokens.

SimplePatternTokenizerFactory

Factory for SimplePatternTokenizer, for matching tokens based on the provided regexp.

SnowballFilter

A filter that stems words using a Snowball-generated stemmer.

SnowballPorterFilterFactory

Factory for SnowballFilter, with configurable language

SnowballProgram

Base class for a snowball stemmer

SnowballStemmer

Parent class of all snowball stemmers, which must implement stem

SolrSynonymParser

Parser for the Solr synonyms format.

SoraniAnalyzer

Analyzer for Sorani Kurdish.

SoraniNormalizationFilter

A TokenFilter that applies SoraniNormalizer to normalize the orthography.

SoraniNormalizationFilterFactory

Factory for SoraniNormalizationFilter.

SoraniNormalizer

Normalizes the Unicode representation of Sorani text.

SoraniStemFilter

A TokenFilter that applies SoraniStemmer to stem Sorani words.

SoraniStemFilterFactory

Factory for SoraniStemFilter.

SoraniStemmer

Light stemmer for Sorani

SortingStrategy

The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.

SpanishAnalyzer

Analyzer for Spanish.

SpanishLightStemFilter

A TokenFilter that applies SpanishLightStemmer to stem Spanish words.

SpanishLightStemFilterFactory

Factory for SpanishLightStemFilter.

SpanishLightStemmer

Light Stemmer for Spanish

SpanishMinimalStemFilter

Deprecated.

Use SpanishPluralStemFilter instead.

SpanishMinimalStemFilterFactory

Deprecated.

Use SpanishPluralStemFilterFactory instead

SpanishMinimalStemmer

Deprecated.

Use SpanishPluralStemmer instead.

SpanishPluralStemFilter

A TokenFilter that applies SpanishPluralStemmer to stem Spanish words.

SpanishPluralStemFilterFactory

Factory for SpanishPluralStemFilterFactory.

SpanishPluralStemmer

Plural Stemmer for Spanish

SpanishStemmer

This class implements the stemming algorithm defined by a snowball script.

StemmerOverrideFilter

Provides the ability to override any KeywordAttribute aware stemmer with custom dictionary-based stemming.

StemmerOverrideFilter.Builder

This builder builds an FST for the StemmerOverrideFilter

StemmerOverrideFilter.StemmerOverrideMap

A read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for StemmerOverrideFilter

StemmerOverrideFilterFactory

Factory for StemmerOverrideFilter.

StemmerUtil

Some commonly-used stemming functions

StopAnalyzer

Filters LetterTokenizer with LowerCaseFilter and StopFilter.

StopFilter

Removes stop words from a token stream.

StopFilterFactory

Factory for StopFilter.

Suggester

A generator for misspelled word corrections based on Hunspell flags.

SuggestionTimeoutException

An exception thrown when Hunspell.suggest(java.lang.String) call takes too long, if TimeoutPolicy.THROW_EXCEPTION is used.

SwedishAnalyzer

Analyzer for Swedish.

SwedishLightStemFilter

A TokenFilter that applies SwedishLightStemmer to stem Swedish words.

SwedishLightStemFilterFactory

Factory for SwedishLightStemFilter.

SwedishLightStemmer

Light Stemmer for Swedish.

SwedishMinimalStemFilter

A TokenFilter that applies SwedishMinimalStemmer to stem Swedish words.

SwedishMinimalStemFilterFactory

Factory for SwedishMinimalStemFilter.

SwedishMinimalStemmer

Minimal Stemmer for Swedish.

SwedishStemmer

This class implements the stemming algorithm defined by a snowball script.

SynonymFilter

Deprecated.

Use SynonymGraphFilter instead, but be sure to also use FlattenGraphFilter at index time (not at search time) as well.

SynonymFilterFactory

Deprecated.

Use SynonymGraphFilterFactory instead, but be sure to also use FlattenGraphFilterFactory at index time (not at search time) as well.

SynonymGraphFilter

Applies single- or multi-token synonyms from a SynonymMap to an incoming TokenStream, producing a fully correct graph output.

SynonymGraphFilterFactory

Factory for SynonymGraphFilter.

SynonymMap

A map of synonyms, keys and values are phrases.

SynonymMap.Builder

Builds an FSTSynonymMap.

SynonymMap.Parser

Abstraction for parsing synonym files.

TamilAnalyzer

Analyzer for Tamil.

TamilStemmer

This class implements the stemming algorithm defined by a snowball script.

TeeSinkTokenFilter

This TokenFilter provides the ability to set aside attribute states that have already been analyzed.

TeeSinkTokenFilter.SinkTokenStream

TokenStream output from a tee.

TeluguAnalyzer

Analyzer for Telugu.

TeluguNormalizationFilter

A TokenFilter that applies TeluguNormalizer to normalize the orthography.

TeluguNormalizationFilterFactory

Factory for TeluguNormalizationFilter.

TeluguNormalizer

Normalizer for Telugu.

TeluguStemFilter

A TokenFilter that applies TeluguStemmer to stem Telugu words.

TeluguStemFilterFactory

Factory for TeluguStemFilter.

TeluguStemmer

Stemmer for Telugu.

TermAndBoost

Wraps a term and boost

TernaryTree

Ternary Search Tree.

ThaiAnalyzer

Analyzer for Thai language.

ThaiTokenizer

Tokenizer that use BreakIterator to tokenize Thai text.

ThaiTokenizerFactory

Factory for ThaiTokenizer.

TimeoutPolicy

A strategy determining what to do when Hunspell API calls take too much time

TokenOffsetPayloadTokenFilter

Adds the OffsetAttribute.startOffset() and OffsetAttribute.endOffset() First 4 bytes are the start

TokenOffsetPayloadTokenFilterFactory

Factory for TokenOffsetPayloadTokenFilter.

TrimFilter

Trims leading and trailing whitespace from Tokens in the stream.

TrimFilterFactory

Factory for TrimFilter.

TruncateTokenFilter

A token filter for truncating the terms into a specific length.

TruncateTokenFilterFactory

Factory for TruncateTokenFilter.

TurkishAnalyzer

Analyzer for Turkish.

TurkishLowerCaseFilter

Normalizes Turkish token text to lower case.

TurkishLowerCaseFilterFactory

Factory for TurkishLowerCaseFilter.

TurkishStemmer

This class implements the stemming algorithm defined by a snowball script.

TypeAsPayloadTokenFilter

Makes the TypeAttribute a payload.

TypeAsPayloadTokenFilterFactory

Factory for TypeAsPayloadTokenFilter.

TypeAsSynonymFilter

Adds the TypeAttribute.type() as a synonym, i.e.

TypeAsSynonymFilterFactory

Factory for TypeAsSynonymFilter.

TypeTokenFilter

Removes tokens whose types appear in a set of blocked types from a token stream.

TypeTokenFilterFactory

Factory class for TypeTokenFilter.

UAX29URLEmailAnalyzer

Filters UAX29URLEmailTokenizer with LowerCaseFilter and StopFilter, using a list of English stop words.

UAX29URLEmailTokenizer

This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.

UAX29URLEmailTokenizerFactory

Factory for UAX29URLEmailTokenizer.

UAX29URLEmailTokenizerImpl

UnicodeProps

This file contains unicode properties used by various CharTokenizers.

UnicodeWhitespaceAnalyzer

An Analyzer that uses UnicodeWhitespaceTokenizer.

UnicodeWhitespaceTokenizer

A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.

UpperCaseFilter

Normalizes token text to UPPER CASE.

UpperCaseFilterFactory

Factory for UpperCaseFilter.

WhitespaceAnalyzer

An Analyzer that uses WhitespaceTokenizer.

WhitespaceTokenizer

A tokenizer that divides text at whitespace characters as defined by Character.isWhitespace(int).

WhitespaceTokenizerFactory

Factory for WhitespaceTokenizer.

WikipediaTokenizer

Extension of StandardTokenizer that is aware of Wikipedia syntax.

WikipediaTokenizerFactory

Factory for WikipediaTokenizer.

Word2VecModel

Word2VecModel is a class representing the parsed Word2Vec model containing the vectors for each word in dictionary

Word2VecSynonymFilter

Applies single-token synonyms from a Word2Vec trained network to an incoming TokenStream.

Word2VecSynonymFilterFactory

Factory for Word2VecSynonymFilter.

Word2VecSynonymProvider

The Word2VecSynonymProvider generates the list of sysnonyms of a term.

Word2VecSynonymProviderFactory

Supply Word2Vec Word2VecSynonymProvider cache avoiding that multiple instances of Word2VecSynonymFilterFactory will instantiate multiple instances of the same SynonymProvider.

WordDelimiterFilter

Deprecated.

Use WordDelimiterGraphFilter instead: it produces a correct token graph so that e.g.

WordDelimiterFilterFactory

Deprecated.

Use WordDelimiterGraphFilterFactory instead: it produces a correct token graph so that e.g.

WordDelimiterGraphFilter

Splits words into subwords and performs optional transformations on subword groups, producing a correct token graph so that e.g.

WordDelimiterGraphFilterFactory

Factory for WordDelimiterGraphFilter.

WordDelimiterIterator

A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterGraphFilter rules.

WordFormGenerator

A utility class used for generating possible word forms by adding affixes to stems (WordFormGenerator.getAllWordForms(String, String, Runnable)), and suggesting stems and flags to generate the given set of words (WordFormGenerator.compress(List, Set, Runnable)).

WordnetSynonymParser

Parser for wordnet prolog format

YiddishStemmer

This class implements the stemming algorithm defined by a snowball script.