public class OpenNLPLemmatizerFilter extends TokenFilter
Runs OpenNLP dictionary-based and/or MaxEnt lemmatizers.
Both a dictionary-based lemmatizer and a MaxEnt lemmatizer are supported, via the "dictionary" and "lemmatizerModel" params, respectively. If both are configured, the dictionary-based lemmatizer is tried first, and then the MaxEnt lemmatizer is consulted for out-of-vocabulary tokens.
The dictionary file must be encoded as UTF-8, with one entry per line, in the form word[tab]lemma[tab]part-of-speech
AttributeSource.State
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
OpenNLPLemmatizerFilter(TokenStream input,
NLPLemmatizerOp lemmatizerOp) |
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken() |
void |
reset() |
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public OpenNLPLemmatizerFilter(TokenStream input, NLPLemmatizerOp lemmatizerOp)
public final boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void reset() throws IOException
reset
in class TokenFilter
IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.