public final class ShingleAnalyzerWrapper extends AnalyzerWrapper
ShingleFilter
around another Analyzer
.
A shingle is another name for a token based n-gram.
Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
Constructor and Description |
---|
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer) |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int maxShingleSize) |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int minShingleSize,
int maxShingleSize) |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int minShingleSize,
int maxShingleSize,
String tokenSeparator,
boolean outputUnigrams,
boolean outputUnigramsIfNoShingles)
Creates a new ShingleAnalyzerWrapper
|
ShingleAnalyzerWrapper(Version matchVersion)
Wraps
StandardAnalyzer . |
ShingleAnalyzerWrapper(Version matchVersion,
int minShingleSize,
int maxShingleSize)
Wraps
StandardAnalyzer . |
Modifier and Type | Method and Description |
---|---|
int |
getMaxShingleSize()
The max shingle (token ngram) size
|
int |
getMinShingleSize()
The min shingle (token ngram) size
|
String |
getTokenSeparator() |
protected Analyzer |
getWrappedAnalyzer(String fieldName) |
boolean |
isOutputUnigrams() |
boolean |
isOutputUnigramsIfNoShingles() |
protected Analyzer.TokenStreamComponents |
wrapComponents(String fieldName,
Analyzer.TokenStreamComponents components) |
createComponents, getOffsetGap, getPositionIncrementGap, initReader
close, tokenStream
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles)
defaultAnalyzer
- Analyzer whose TokenStream is to be filteredminShingleSize
- Min shingle (token ngram) sizemaxShingleSize
- Max shingle sizetokenSeparator
- Used to separate input stream tokens in output shinglesoutputUnigrams
- Whether or not the filter shall pass the original
tokens to the output streamoutputUnigramsIfNoShingles
- Overrides the behavior of outputUnigrams==false for those
times when no shingles are available (because there are fewer than
minShingleSize tokens in the input stream)?
Note that if outputUnigrams==true, then unigrams are always output,
regardless of whether any shingles are available.public ShingleAnalyzerWrapper(Version matchVersion)
StandardAnalyzer
.public ShingleAnalyzerWrapper(Version matchVersion, int minShingleSize, int maxShingleSize)
StandardAnalyzer
.public int getMaxShingleSize()
public int getMinShingleSize()
public String getTokenSeparator()
public boolean isOutputUnigrams()
public boolean isOutputUnigramsIfNoShingles()
protected Analyzer getWrappedAnalyzer(String fieldName)
getWrappedAnalyzer
in class AnalyzerWrapper
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
wrapComponents
in class AnalyzerWrapper
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.