Lucene.Net
3.0.3
Lucene.Net is a .NET port of the Java Lucene Indexing Library
|
A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using NumericField (expert: NumericTokenStream ). If your terms are instead textual, you should use TermRangeQuery. NumericRangeFilter{T} is the filter equivalent of this query. More...
Inherits Lucene.Net.Search.MultiTermQuery.
Inherited by Lucene.Net.Search.NumericRangeQuery< T >.NumericRangeTermEnum, Lucene.Net.Search.NumericRangeQuery< T >.NumericRangeTermEnum.AnonymousClassIntRangeBuilder, and Lucene.Net.Search.NumericRangeQuery< T >.NumericRangeTermEnum.AnonymousClassLongRangeBuilder.
Public Member Functions | |
override System.String | ToString (System.String field) |
Prints a query to a string, with field assumed to be the default field and omitted. The representation used is one that is supposed to be readable by QueryParser. However, there are the following limitations:
| |
override bool | Equals (System.Object o) |
override int | GetHashCode () |
Public Member Functions inherited from Lucene.Net.Search.MultiTermQuery | |
virtual void | ClearTotalNumberOfTerms () |
Expert: Resets the counting of unique terms. Do this before executing the query/filter. | |
override Query | Rewrite (IndexReader reader) |
Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys. | |
override int | GetHashCode () |
override bool | Equals (System.Object obj) |
Public Member Functions inherited from Lucene.Net.Search.Query | |
override System.String | ToString () |
Prints a query to a string. | |
virtual Weight | CreateWeight (Searcher searcher) |
Expert: Constructs an appropriate Weight implementation for this query. | |
virtual Weight | Weight (Searcher searcher) |
Expert: Constructs and initializes a Weight for a top-level query. | |
virtual Query | Combine (Query[] queries) |
Expert: called when re-writing queries under MultiSearcher. | |
virtual void | ExtractTerms (System.Collections.Generic.ISet< Term > terms) |
Expert: adds all terms occuring in this query to the terms set. Only works if this query is in its rewritten form. | |
virtual Similarity | GetSimilarity (Searcher searcher) |
Expert: Returns the Similarity implementation to be used for this query. Subclasses may override this method to specify their own Similarity implementation, perhaps one that delegates through that of the Searcher. By default the Searcher's Similarity implementation is returned. | |
virtual System.Object | Clone () |
Returns a clone of this query. | |
override int | GetHashCode () |
override bool | Equals (System.Object obj) |
Properties | |
string | Field [get] |
Returns the field name for this query | |
bool | IncludesMin [get] |
Returns true if the lower endpoint is inclusive | |
bool | IncludesMax [get] |
Returns true if the upper endpoint is inclusive | |
T | Min [get] |
Returns the lower value of this range query | |
T | Max [get] |
Returns the upper value of this range query | |
Properties inherited from Lucene.Net.Search.MultiTermQuery | |
virtual int | TotalNumberOfTerms [get] |
Expert: Return the number of unique terms visited during execution of the query. If there are many of them, you may consider using another query type or optimize your total term count in index. This method is not thread safe, be sure to only call it when no query is running! If you re-use the same query instance for another search, be sure to first reset the term counter with ClearTotalNumberOfTerms. On optimized indexes / no MultiReaders, you get the correct number of unique terms for the whole index. Use this number to compare different queries. For non-optimized indexes this number can also be achived in non-constant-score mode. In constant-score mode you get the total number of terms seeked for all segments / sub-readers. | |
virtual RewriteMethod | RewriteMethod [get, set] |
Sets the rewrite method to be used when executing the query. You can use one of the four core methods, or implement your own subclass of Search.RewriteMethod. | |
Properties inherited from Lucene.Net.Search.Query | |
virtual float | Boost [get, set] |
Gets or sets the boost for this query clause to b . Documents matching this clause will (in addition to the normal weightings) have their score multiplied by b . The boost is 1.0 by default. | |
Additional Inherited Members | |
Static Public Member Functions inherited from Lucene.Net.Search.Query | |
static Query | MergeBooleanQueries (params BooleanQuery[] queries) |
Expert: merges the clauses of a set of BooleanQuery's into a single BooleanQuery. | |
Static Public Attributes inherited from Lucene.Net.Search.MultiTermQuery | |
static readonly RewriteMethod | CONSTANT_SCORE_FILTER_REWRITE = new ConstantScoreFilterRewrite() |
A rewrite method that first creates a private Filter, by visiting each term in sequence and marking all docs for that term. Matching documents are assigned a constant score equal to the query's boost. | |
static readonly RewriteMethod | SCORING_BOOLEAN_QUERY_REWRITE = new ScoringBooleanQueryRewrite() |
A rewrite method that first translates each term into Occur.SHOULD clause in a BooleanQuery, and keeps the scores as computed by the query. Note that typically such scores are meaningless to the user, and require non-trivial CPU to compute, so it's almost always better to use CONSTANT_SCORE_AUTO_REWRITE_DEFAULT instead. | |
static readonly RewriteMethod | CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE = new ConstantScoreBooleanQueryRewrite() |
Like SCORING_BOOLEAN_QUERY_REWRITE except scores are not computed. Instead, each matching document receives a constant score equal to the query's boost. | |
static readonly RewriteMethod | CONSTANT_SCORE_AUTO_REWRITE_DEFAULT |
Read-only default instance of ConstantScoreAutoRewrite , with ConstantScoreAutoRewrite.TermCountCutoff set to ConstantScoreAutoRewrite.DEFAULT_TERM_COUNT_CUTOFF | |
Protected Member Functions inherited from Lucene.Net.Search.MultiTermQuery | |
MultiTermQuery () | |
Constructs a query matching terms that cannot be represented with a single Term. | |
A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using NumericField (expert: NumericTokenStream ). If your terms are instead textual, you should use TermRangeQuery. NumericRangeFilter{T} is the filter equivalent of this query.
You create a new NumericRangeQuery with the static factory methods, eg:
Query q = NumericRangeQuery.newFloatRange("weight", new Float(0.3f), new Float(0.10f), true, true);
matches all documents whose float valued "weight" field ranges from 0.3 to 0.10, inclusive.
The performance of NumericRangeQuery is much better than the corresponding TermRangeQuery because the number of terms that must be searched is usually far fewer, thanks to trie indexing, described below.
You can optionally specify a precisionStep
when creating this query. This is necessary if you've changed this configuration from its default (4) during indexing. Lower values consume more disk space but speed up searching. Suitable values are between 1 and 8. A good starting point to test is 4, which is the default value for all Numeric*
classes. See below for details.
This query defaults to MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT for 32 bit (int/float) ranges with precisionStep <8 and 64 bit (long/double) ranges with precisionStep <6. Otherwise it uses MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE as the number of terms is likely to be high. With precision steps of <4, this query can be run with one of the BooleanQuery rewrite methods without changing BooleanQuery's default max clause count.
<font color="red">NOTE: This API is experimental and might change in incompatible ways in the next release.</font>
See the publication about panFMP, where this algorithm was described (referred to as TrieRangeQuery
):
Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023
A quote from this paper: Because Apache Lucene is a full-text search engine and not a conventional database, it cannot handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical values). We have developed an extension to Apache Lucene that stores the numerical values in a special string-encoded format with variable precision (all numerical values like doubles, longs, floats, and ints are converted to lexicographic sortable string representations and stored with different precisions (for a more detailed description of how the values are stored, see NumericUtils). A range is then divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.
For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the lowest precision. Overall, a range could consist of a theoretical maximum of 7*255*2 + 255 = 3825
distinct terms (when there is a term for every distinct value of an 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used because it would always be possible to reduce the full 256 values to one term with degraded precision). In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records and a uniform value distribution).
You can choose any precisionStep
when encoding values. Lower step values mean more precisions and so more terms in index (and index gets larger). On the other hand, the maximum number of terms to match reduces, which optimized query speed. The formula to calculate the maximum term count is: n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
(this formula is only correct, when bitsPerValue/precisionStep
is an integer; in other cases, the value must be rounded up and the last summand must contain the modulo of the division as precision step). For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465
, and for a precision step of 2, n = 31*3*2 + 3 = 189
. But the faster search speed is reduced by more seeking in the term enum of the index. Because of this, the ideal precisionStep
value can only be found out by testing. Important: You can index with a lower precision step value and test search speed using a multiple of the original step value.
Good values for precisionStep
are depending on usage and data type:
precisionStep
is given. precisionStep
). Using NumericFields for sorting is ideal, because building the field cache is much faster than with text-only numbers. Sorting is also possible with range query optimized fields using one of the above precisionSteps
. Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed that TermRangeQuery in boolean rewrite mode (with raised BooleanQuery clause count) took about 30-40 secs to complete, TermRangeQuery in constant score filter rewrite mode took 5 secs and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit precision step). This query type was developed for a geographic portal, where the performance for e.g. bounding boxes or exact date/time stamps is important.
<since> 2.9
</since>
T | : | struct | |
T | : | IComparable<T> |
Definition at line 156 of file NumericRangeQuery.cs.
override bool Lucene.Net.Search.NumericRangeQuery< T >.Equals | ( | System.Object | o | ) |
Definition at line 244 of file NumericRangeQuery.cs.
override int Lucene.Net.Search.NumericRangeQuery< T >.GetHashCode | ( | ) |
Definition at line 258 of file NumericRangeQuery.cs.
|
virtual |
Prints a query to a string, with field
assumed to be the default field and omitted. The representation used is one that is supposed to be readable by QueryParser. However, there are the following limitations:
Implements Lucene.Net.Search.Query.
Definition at line 236 of file NumericRangeQuery.cs.
|
get |
Returns the field name for this query
Definition at line 208 of file NumericRangeQuery.cs.
|
get |
Returns true
if the upper endpoint is inclusive
Definition at line 220 of file NumericRangeQuery.cs.
|
get |
Returns true
if the lower endpoint is inclusive
Definition at line 214 of file NumericRangeQuery.cs.
|
get |
Returns the upper value of this range query
Definition at line 232 of file NumericRangeQuery.cs.
|
get |
Returns the lower value of this range query
Definition at line 226 of file NumericRangeQuery.cs.