DictionaryCompoundWordTokenFilter (Lucene 3.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
                  org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter

All Implemented Interfaces:: Closeable

public class DictionaryCompoundWordTokenFilter
extends CompoundWordTokenFilterBase
extends CompoundWordTokenFilterBase

A TokenFilter that decomposes compound words found in many Germanic languages.

"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.

You must specify the required Version compatibility when creating CompoundWordTokenFilterBase:

As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.

If you pass in a CharArraySet as dictionary, it should be case-insensitive unless it contains only lowercased entries and you have LowerCaseFilter before this filter in your analysis chain. For optional performance (as this filter does lots of lookups to the dictionary, you should use the latter analysis chain/CharArraySet). Be aware: If you supply arbitrary Sets to the ctors or String[] dictionaries, they will be automatically transformed to case-insensitive!

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`CompoundWordTokenFilterBase.CompoundToken`

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokens`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, Set)` instead
`DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, Set, int, int, int, boolean)` instead
`DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, String[])` instead
`DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. use `DictionaryCompoundWordTokenFilter(Version, TokenStream, String[], int, int, int, boolean)` instead
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set<?> dictionary)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Creates a new `DictionaryCompoundWordTokenFilter`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary)` Deprecated. Use the constructors taking `Set`
`DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)` Deprecated. Use the constructors taking `Set`

Method Summary
`protected void`	`decompose()` Decomposes the current `CompoundWordTokenFilterBase.termAtt` and places `CompoundWordTokenFilterBase.CompoundToken` instances in the `CompoundWordTokenFilterBase.tokens` list.

Methods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
`incrementToken, makeDictionary, reset`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail