The MoreLikeThis type exposes the following members.
Constructors
Name | Description | |
---|---|---|
MoreLikeThis | Constructor requiring an IndexReader. |
Methods
Name | Description | |
---|---|---|
DescribeParams | Describe the parameters that control how the "more like this" query is formed. | |
Equals | (Inherited from Object.) | |
Finalize | Allows an Object to attempt to free resources and perform other cleanup operations before the Object is reclaimed by garbage collection. (Inherited from Object.) | |
GetAnalyzer | Returns an analyzer that will be used to parse source doc with. The default analyzer
is the {@link #DEFAULT_ANALYZER}.
| |
GetFieldNames | Returns the field names that will be used when generating the 'More Like This' query.
The default field names that will be used is {@link #DEFAULT_FIELD_NAMES}.
| |
GetHashCode | Serves as a hash function for a particular type. (Inherited from Object.) | |
GetMaxNumTokensParsed | ||
GetMaxQueryTerms | Returns the maximum number of query terms that will be included in any generated query.
The default is {@link #DEFAULT_MAX_QUERY_TERMS}.
| |
GetMaxWordLen | Returns the maximum word length above which words will be ignored. Set this to 0 for no
maximum word length. The default is {@link #DEFAULT_MAX_WORD_LENGTH}.
| |
GetMinDocFreq | Returns the frequency at which words will be ignored which do not occur in at least this
many docs. The default frequency is {@link #DEFALT_MIN_DOC_FREQ}.
| |
GetMinTermFreq | Returns the frequency below which terms will be ignored in the source doc. The default
frequency is the {@link #DEFAULT_MIN_TERM_FREQ}.
| |
GetMinWordLen | Returns the minimum word length below which words will be ignored. Set this to 0 for no
minimum word length. The default is {@link #DEFAULT_MIN_WORD_LENGTH}.
| |
GetStopWords | Get the current stop words being used. | |
GetType | Gets the Type of the current instance. (Inherited from Object.) | |
IsBoost | Returns whether to boost terms in query based on "score" or not. The default is
{@link #DEFAULT_BOOST}.
| |
Like(Int32) | Return a query that will return docs like the passed lucene document ID.
| |
Like(FileInfo) | Return a query that will return docs like the passed file.
| |
Like(Stream) | Return a query that will return docs like the passed stream.
| |
Like(StreamReader) | Return a query that will return docs like the passed Reader.
| |
Like(Uri) | Return a query that will return docs like the passed URL.
| |
Main | Test driver.
Pass in "-i INDEX" and then either "-fn FILE" or "-url URL".
| |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object.) | |
RetrieveInterestingTerms | Convenience routine to make it easy to return the most interesting words in a document.
More advanced users will call {@link #RetrieveTerms(java.io.Reader) retrieveTerms()} directly.
| |
RetrieveTerms | ||
SetAnalyzer | Sets the analyzer to use. An analyzer is not required for generating a query with the
{@link #Like(int)} method, all other 'like' methods require an analyzer.
| |
SetBoost | Sets whether to boost terms in query based on "score" or not.
| |
SetFieldNames | Sets the field names that will be used when generating the 'More Like This' query.
Set this to null for the field names to be determined at runtime from the IndexReader
provided in the constructor.
| |
SetMaxNumTokensParsed | ||
SetMaxQueryTerms | Sets the maximum number of query terms that will be included in any generated query.
| |
SetMaxWordLen | Sets the maximum word length above which words will be ignored.
| |
SetMinDocFreq | Sets the frequency at which words will be ignored which do not occur in at least this
many docs.
| |
SetMinTermFreq | Sets the frequency below which terms will be ignored in the source doc.
| |
SetMinWordLen | Sets the minimum word length below which words will be ignored.
| |
SetStopWords | Set the set of stopwords.
Any word in this set is considered "uninteresting" and ignored.
Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as
for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting".
| |
ToString | (Inherited from Object.) |
Fields
Name | Description | |
---|---|---|
DEFALT_MIN_DOC_FREQ | Ignore words which do not occur in at least this many docs. | |
DEFAULT_ANALYZER | Default analyzer to parse source doc with. | |
DEFAULT_BOOST | Boost terms in query based on score. | |
DEFAULT_FIELD_NAMES | Default field names. Null is used to specify that the field names should be looked
up at runtime from the provided reader.
| |
DEFAULT_MAX_NUM_TOKENS_PARSED | Default maximum number of tokens to parse in each example doc field that is not stored with TermVector support. | |
DEFAULT_MAX_QUERY_TERMS | Return a Query with no more than this many terms.
| |
DEFAULT_MAX_WORD_LENGTH | Ignore words greater than this length or if 0 then this has no effect. | |
DEFAULT_MIN_TERM_FREQ | Ignore terms with less than this frequency in the source doc. | |
DEFAULT_MIN_WORD_LENGTH | Ignore words less than this length or if 0 then this has no effect. | |
DEFAULT_STOP_WORDS | Default set of stopwords.
If null means to allow stop words.
|