public final class CJKBigramFilter
extends org.apache.lucene.analysis.TokenFilter
CJK types are set by these tokenizers, but you can also use
CJKBigramFilter(TokenStream, int)
to explicitly control which
of the CJK scripts are turned into bigrams.
In all cases, all non-CJK input is passed thru unmodified.
Modifier and Type | Field and Description |
---|---|
static String |
DOUBLE_TYPE
when we emit a bigram, its then marked as this type
|
static int |
HAN
bigram flag for Han Ideographs
|
static int |
HANGUL
bigram flag for Hangul
|
static int |
HIRAGANA
bigram flag for Hiragana
|
static int |
KATAKANA
bigram flag for Katakana
|
static String |
SINGLE_TYPE
when we emit a unigram, its then marked as this type
|
Constructor and Description |
---|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in)
|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in,
int flags)
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
|
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken() |
void |
reset() |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public static final int HAN
public static final int HIRAGANA
public static final int KATAKANA
public static final int HANGUL
public static final String DOUBLE_TYPE
public static final String SINGLE_TYPE
public CJKBigramFilter(org.apache.lucene.analysis.TokenStream in)
public boolean incrementToken() throws IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
IOException
public void reset() throws IOException
reset
in class org.apache.lucene.analysis.TokenFilter
IOException