public final class SimilarityQueries extends Object
MoreLikeThis
Modifier and Type | Method and Description |
---|---|
static org.apache.lucene.search.Query |
formSimilarQuery(String body,
org.apache.lucene.analysis.Analyzer a,
String field,
Set<?> stop)
Simple similarity query generators.
|
public static org.apache.lucene.search.Query formSimilarQuery(String body, org.apache.lucene.analysis.Analyzer a, String field, Set<?> stop) throws IOException
IndexSearcher
for similar docs.
The only caveat is the first hit returned should be your source document - you'll
need to then ignore that.
So, if you have a code fragment like this:
Query q = formSimilaryQuery( "I use Lucene to search fast. Fast searchers are good", new StandardAnalyzer(), "contents", null);
The query returned, in string form, will be '(i use lucene to search fast searchers are good')
.
The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucene's scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.
This method is fail-safe in that if a long 'body' is passed in and
BooleanQuery.add()
(used internally)
throws
BooleanQuery.TooManyClauses
, the
query as it is will be returned.
body
- the body of the document you want to find similar documents toa
- the analyzer to use to parse the bodyfield
- the field you want to search on, probably something like "contents" or "body"stop
- optional set of stop words to ignoreIOException
- this can't happen...