Creates a shingle filter based on a user defined matrix.
The filter /will/ delete columns from the input matrix! You will not be able to reset the filter if you used this constructor.
todo: don't touch the matrix! use a bool, set the input stream to null or something, and keep track of where in the matrix we are at.
Namespace: Lucene.Net.Analyzers.ShingleAssembly: Lucene.Net.Contrib.Analyzers (in Lucene.Net.Contrib.Analyzers.dll) Version: 2.9.2.1 (2.9.2.1)
Syntax
C# |
---|
public ShingleMatrixFilter( Matrix matrix, int minimumShingleSize, int maximumShingleSize, char spacerCharacter, bool ignoringSinglePrefixOrSuffixShingle, TokenSettingsCodec settingsCodec ) |
Visual Basic |
---|
Public Sub New ( _ matrix As Matrix, _ minimumShingleSize As Integer, _ maximumShingleSize As Integer, _ spacerCharacter As Char, _ ignoringSinglePrefixOrSuffixShingle As Boolean, _ settingsCodec As TokenSettingsCodec _ ) |
Visual C++ |
---|
public: ShingleMatrixFilter( Matrix^ matrix, int minimumShingleSize, int maximumShingleSize, wchar_t spacerCharacter, bool ignoringSinglePrefixOrSuffixShingle, TokenSettingsCodec^ settingsCodec ) |
Parameters
- matrix
- Type: Lucene.Net.Analyzers.Shingle.Matrix..::..Matrix
the input based for creating shingles. Does not need to contain any information until ShingleMatrixFilter.Next(Token) is called the first time.
- minimumShingleSize
- Type: System..::..Int32
minimum number of tokens in any shingle.
- maximumShingleSize
- Type: System..::..Int32
maximum number of tokens in any shingle.
- spacerCharacter
- Type: System..::..Char
character to use between texts of the token parts in a shingle. null for none.
- ignoringSinglePrefixOrSuffixShingle
- Type: System..::..Boolean
if true, shingles that only contains permutation of the first of the last column will not be produced as shingles. Useful when adding boundary marker tokens such as '^' and '$'.
- settingsCodec
- Type: Lucene.Net.Analyzers.Shingle.Codec..::..TokenSettingsCodec
codec used to read input token weight and matrix positioning.