LuceneContribQuery.dtd
: Elements - Entities - Source | Intro - Index
FRAMES / NO FRAMES
This DTD builds on the core Lucene XML syntax and adds support for features found in the "contrib" section of the Lucene project.
CorePlusExtensionsParser.java is the Java class that encapsulates this parser behaviour.
The features added are:
<BooleanQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted. Some clauses may represent optional Query criteria while others represent mandatory criteria.
Example: Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo"
<BooleanQuery fieldName="contents"> <Clause occurs="should"> <TermQuery>merger</TermQuery> </Clause> <Clause occurs="mustnot"> <TermQuery>sumitomo</TermQuery> </Clause> <Clause occurs="must"> <TermQuery>bank</TermQuery> </Clause> </BooleanQuery>
Element's model:
<BooleanQuery>'s children Name Cardinality Clause At least one
<BooleanQuery>'s attributes Name Values Default boost 1.0 disableCoord true, false false fieldName minimumNumberShouldMatch 0
(Clause)+
@boost | Attribute of BooleanQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of BooleanQuery |
fieldName can optionally be defined here as a default attribute used by all child elements
@disableCoord | Attribute of BooleanQuery |
The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.
Possible values: true, false - Default value: false
@minimumNumberShouldMatch | Attribute of BooleanQuery |
The minimum number of optional clauses that should be present in any one document before it is considered to be a match.
Default value: 0
<Clause> | Child of BooleanFilter, BooleanQuery |
NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be child elements - while in a <BooleanFilter> clause only "filter" types can be contained.
Element's model:
<Clause>'s children Name Cardinality BooleanFilter One or none BooleanQuery One or none BoostingQuery One or none BoostingTermQuery One or none CachedFilter One or none ConstantScoreQuery One or none DuplicateFilter One or none FilteredQuery One or none FuzzyLikeThisQuery One or none LikeThisQuery One or none MatchAllDocsQuery One or none NumericRangeFilter One or none NumericRangeQuery One or none RangeFilter One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsFilter One or none TermsQuery One or none UserQuery One or none
<Clause>'s attributes Name Values Default occurs should, must, mustnot should
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery | RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)
@occurs | Attribute of Clause |
Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)
Possible values: should, must, mustnot - Default value: should
<CachedFilter> | Child of Clause, Filter, ConstantScoreQuery |
Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes. Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just a Boolean yes/no record of which documents matched.
Example: Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in RAM to eliminate the cost of building this filter from disk for every query
<FilteredQuery> <Query> <UserQuery>bank</UserQuery> </Query> <Filter> <CachedFilter> <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> </CachedFilter> </Filter> </FilteredQuery>
Element's model:
<CachedFilter>'s children Name Cardinality BooleanFilter One or none BooleanQuery One or none BoostingQuery One or none BoostingTermQuery One or none CachedFilter One or none ConstantScoreQuery One or none DuplicateFilter One or none FilteredQuery One or none FuzzyLikeThisQuery One or none LikeThisQuery One or none MatchAllDocsQuery One or none NumericRangeFilter One or none NumericRangeQuery One or none RangeFilter One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsFilter One or none TermsQuery One or none UserQuery One or none
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery | RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)
<UserQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"
Example: Search for documents about John Smith or John Doe using standard LuceneQuerySyntax
<UserQuery>"John Smith" OR "John Doe"</UserQuery>
<UserQuery>'s attributes Name Values Default boost 1.0 fieldName
@boost | Attribute of UserQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of UserQuery |
fieldName can optionally be defined here to change the default field used in the QueryParser
<MatchAllDocsQuery/> | Child of BoostQuery, Clause, CachedFilter, Query |
A query which is used to match all documents. This has a couple of uses:
Example: Effectively use a Filter as a query
<FilteredQuery> <Query> <MatchAllDocsQuery/> </Query> <Filter> <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/> </Filter> </FilteredQuery>
This element is always empty.
<TermQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
a single term query - no analysis is done of the child text
Example: Match on a primary key
<TermQuery fieldName="primaryKey">13424</TermQuery>
<TermQuery>'s attributes Name Values Default boost 1.0 fieldName
@boost | Attribute of TermQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of TermQuery |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
<BoostingTermQuery> | Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query |
A boosted term query - no analysis is done of the child text. Also a span member.
(Text below is copied from the javadocs of BoostingTermQuery)
The BoostingTermQuery is very similar to the {
@boost | Attribute of TermQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of TermQuery |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
<TermsQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
The equivalent of a BooleanQuery with multiple optional TermQuery clauses. Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic. Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable of producing a Query parse error given any user input
Example: Match on text from a database description (which may contain characters that are illegal characters in the standard Lucene Query syntax used in the UserQuery tag
<TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery>
<TermsQuery>'s attributes Name Values Default boost 1.0 disableCoord true, false false fieldName minimumNumberShouldMatch 0
@boost | Attribute of TermsQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of TermsQuery |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@disableCoord | Attribute of TermsQuery |
The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.
Possible values: true, false - Default value: false
@minimumNumberShouldMatch | Attribute of TermsQuery |
The minimum number of terms that should be present in any one document before it is considered to be a match.
Default value: 0
<FilteredQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
Runs a Query and filters results to only those query matches that also match the Filter element.
Example: Find all documents about Lucene that have a status of "published"
<FilteredQuery> <Query> <UserQuery>Lucene</UserQuery> </Query> <Filter> <TermsFilter fieldName="status">published</TermsFilter> </Filter> </FilteredQuery>
Element's model:
<FilteredQuery>'s children Name Cardinality Filter Only one Query Only one
<FilteredQuery>'s attributes Name Values Default boost 1.0
@boost | Attribute of FilteredQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<Query> | Child of FilteredQuery, BoostingQuery |
Used to identify a nested Query element inside another container element. NOT a top-level query tag
Element's model:
<Query>'s children Name Cardinality BooleanQuery One or none BoostingQuery One or none BoostingTermQuery One or none ConstantScoreQuery One or none FilteredQuery One or none FuzzyLikeThisQuery One or none LikeThisQuery One or none MatchAllDocsQuery One or none NumericRangeQuery One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsQuery One or none UserQuery One or none
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery)
<Filter> | Child of FilteredQuery |
The choice of Filter that MUST also be matched
Element's model:
<Filter>'s children Name Cardinality BooleanFilter One or none CachedFilter One or none DuplicateFilter One or none NumericRangeFilter One or none RangeFilter One or none TermsFilter One or none
(RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)
<RangeFilter/> | Child of Clause, Filter, CachedFilter, ConstantScoreQuery |
Filter used to limit query results to documents matching a range of field values
Example: Search for documents about banks from the last 10 years
<FilteredQuery> <Query> <UserQuery>bank</UserQuery> </Query> <Filter> <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> </Filter> </FilteredQuery>
<RangeFilter>'s attributes Name Values Default fieldName includeLower true, false true includeUpper true, false true lowerTerm upperTerm
This element is always empty.
@fieldName | Attribute of RangeFilter |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@lowerTerm | Attribute of RangeFilter |
The lower-most term value for this field (must be <= upperTerm)
Required
@upperTerm | Attribute of RangeFilter |
The upper-most term value for this field (must be >= lowerTerm)
Required
@includeLower | Attribute of RangeFilter |
Controls if the lowerTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
@includeUpper | Attribute of RangeFilter |
Controls if the upperTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
<NumericRangeQuery/> | Child of BoostQuery, Clause, CachedFilter, Query |
A Query that matches numeric values within a specified range.
Example: Search for documents about people who are aged 20-25
<NumericRangeQuery fieldName="age" lowerTerm="20" upperTerm="25" />
<NumericRangeQuery>'s attributes Name Values Default fieldName includeLower true, false true includeUpper true, false true lowerTerm precisionStep 4 type int, long, float, double int upperTerm
This element is always empty.
@fieldName | Attribute of NumericRangeQuery |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@lowerTerm | Attribute of NumericRangeQuery |
The lower-most term value for this field (must be <= upperTerm and a valid native java numeric type)
Required
@upperTerm | Attribute of NumericRangeQuery |
The upper-most term value for this field (must be >= lowerTerm and a valid native java numeric type)
Required
@type | Attribute of NumericRangeQuery |
The numeric type of this field
Possible values: int, long, float, double - Default value: int
@includeLower | Attribute of NumericRangeQuery |
Controls if the lowerTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
@includeUpper | Attribute of NumericRangeQuery |
Controls if the upperTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
@precisionStep | Attribute of NumericRangeQuery |
Lower step values mean more precisions and so more terms in index (and index gets larger). This value must be an integer
Default value: 4
<NumericRangeFilter/> | Child of Clause, Filter, CachedFilter, ConstantScoreQuery |
A Filter that only accepts numeric values within a specified range
Example: Search for documents about people who are aged 20-25
<FilteredQuery> <Query> <UserQuery>person</UserQuery> </Query> <Filter> <NumericRangeFilter fieldName="age" lowerTerm="20" upperTerm="25"/> </Filter> </FilteredQuery>
<NumericRangeFilter>'s attributes Name Values Default fieldName includeLower true, false true includeUpper true, false true lowerTerm precisionStep 4 type int, long, float, double int upperTerm
This element is always empty.
@fieldName | Attribute of NumericRangeFilter |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@lowerTerm | Attribute of NumericRangeFilter |
The lower-most term value for this field (must be <= upperTerm and a valid native java numeric type)
Required
@upperTerm | Attribute of NumericRangeFilter |
The upper-most term value for this field (must be >= lowerTerm and a valid native java numeric type)
Required
@type | Attribute of NumericRangeFilter |
The numeric type of this field
Possible values: int, long, float, double - Default value: int
@includeLower | Attribute of NumericRangeFilter |
Controls if the lowerTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
@includeUpper | Attribute of NumericRangeFilter |
Controls if the upperTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
@precisionStep | Attribute of NumericRangeFilter |
Lower step values mean more precisions and so more terms in index (and index gets larger). This value must be an integer
Default value: 4
<SpanTerm> | Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query |
A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity
Example: Find documents using terms close to each other about mining and accidents
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOr> <SpanTerm>killed</SpanTerm> <SpanTerm>died</SpanTerm> <SpanTerm>dead</SpanTerm> </SpanOr> <SpanOr> <SpanTerm>miner</SpanTerm> <SpanTerm>mining</SpanTerm> <SpanTerm>miners</SpanTerm> </SpanOr> </SpanNear>
<SpanTerm>'s attributes Name Values Default fieldName
@fieldName | Attribute of SpanTerm |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
Required
<SpanOrTerms> | Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query |
A field-specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic
Example: Use SpanOrTerms as a more convenient/succinct way of expressing multiple choices of SpanTerms. This example looks for reports using words describing a fatality near to references to miners
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOrTerms>killed died death dead deaths</SpanOrTerms> <SpanOrTerms>miner mining miners</SpanOrTerms> </SpanNear>
<SpanOrTerms>'s attributes Name Values Default fieldName
@fieldName | Attribute of SpanOrTerms |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
Required
<SpanOr> | Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanNear, Exclude, Query |
Takes any number of child queries from the Span family
Example: Find documents using terms close to each other about mining and accidents
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOr> <SpanTerm>killed</SpanTerm> <SpanTerm>died</SpanTerm> <SpanTerm>dead</SpanTerm> </SpanOr> <SpanOr> <SpanTerm>miner</SpanTerm> <SpanTerm>mining</SpanTerm> <SpanTerm>miners</SpanTerm> </SpanOr> </SpanNear>
Element's model:
<SpanOr>'s children Name Cardinality BoostingTermQuery Any number SpanFirst Any number SpanNear Any number SpanNot Any number SpanOr Any number SpanOrTerms Any number SpanTerm Any number
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)*
<SpanNear> | Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, Exclude, Query |
Takes any number of child queries from the Span family and tests for proximity
Element's model:
<SpanNear>'s children Name Cardinality BoostingTermQuery Any number SpanFirst Any number SpanNear Any number SpanNot Any number SpanOr Any number SpanOrTerms Any number SpanTerm Any number
<SpanNear>'s attributes Name Values Default inOrder true, false true slop
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)*
@slop | Attribute of SpanNear |
defines the maximum distance between Span elements where distance is expressed as word number, not byte offset
Example: Find documents using terms within 8 words of each other talking about mining and accidents
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOr> <SpanTerm>killed</SpanTerm> <SpanTerm>died</SpanTerm> <SpanTerm>dead</SpanTerm> </SpanOr> <SpanOr> <SpanTerm>miner</SpanTerm> <SpanTerm>mining</SpanTerm> <SpanTerm>miners</SpanTerm> </SpanOr> </SpanNear>
Required
@inOrder | Attribute of SpanNear |
Controls if matching terms have to appear in the order listed or can be reversed
Possible values: true, false - Default value: true
<SpanFirst> | Child of BoostQuery, Clause, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query |
Looks for a SpanQuery match occuring near the beginning of a document
Example: Find letters where the first 50 words talk about a resignation:
<SpanFirst end="50"> <SpanOrTerms fieldName="text">resigning resign leave</SpanOrTerms> </SpanFirst>
Element's model:
<SpanFirst>'s children Name Cardinality BoostingTermQuery One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none
<SpanFirst>'s attributes Name Values Default boost 1.0 end
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)
@end | Attribute of SpanFirst |
Controls the end of the region considered in a document's field (expressed in word number, not byte offset)
Required
@boost | Attribute of SpanFirst |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<SpanNot> | Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query |
Finds documents matching a SpanQuery but not if matching another SpanQuery
Example: Find documents talking about social services but not containing the word "public"
<SpanNot fieldName="text"> <Include> <SpanNear slop="2" inOrder="true"> <SpanTerm>social</SpanTerm> <SpanTerm>services</SpanTerm> </SpanNear> </Include> <Exclude> <SpanTerm>public</SpanTerm> </Exclude> </SpanNot>
Element's model:
<SpanNot>'s children Name Cardinality Exclude Only one Include Only one
<Include> | Child of SpanNot |
The SpanQuery to find
Element's model:
<Include>'s children Name Cardinality BoostingTermQuery One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)
<Exclude> | Child of SpanNot |
The SpanQuery to be avoided
Element's model:
<Exclude>'s children Name Cardinality BoostingTermQuery One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)
<ConstantScoreQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
a utility tag to wrap any filter as a query
Example: Find all documents from the last 10 years
<ConstantScoreQuery> <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> </ConstantScoreQuery>
Element's model:
<ConstantScoreQuery>'s children Name Cardinality BooleanFilter Any number CachedFilter Any number DuplicateFilter Any number NumericRangeFilter Any number RangeFilter Any number TermsFilter Any number
<ConstantScoreQuery>'s attributes Name Values Default boost 1.0
(RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)*
@boost | Attribute of ConstantScoreQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<FuzzyLikeThisQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
Performs fuzzy matching on "significant" terms in fields. Improves on "LikeThisQuery" by allowing for fuzzy variations of supplied fields. Improves on FuzzyQuery by rewarding all fuzzy variants of a term with the same IDF rather than default fuzzy behaviour which ranks rarer variants (typically misspellings) more highly. This can be a useful default search mode for processing user input where the end user is not expected to know about the standard query operators for fuzzy, boolean or phrase logic found in UserQuery
Example: Search for information about the Sumitomo bank, where the end user has mis-spelt the name
<FuzzyLikeThisQuery> <Field fieldName="contents"> Sumitimo bank </Field> </FuzzyLikeThisQuery>
Element's model:
<FuzzyLikeThisQuery>'s children Name Cardinality Field Any number
<FuzzyLikeThisQuery>'s attributes Name Values Default boost 1.0 ignoreTF true, false false maxNumTerms 50
(Field)*
@boost | Attribute of FuzzyLikeThisQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@maxNumTerms | Attribute of FuzzyLikeThisQuery |
Limits the total number of terms selected from the provided text plus the selected "fuzzy" variants
Default value: 50
@ignoreTF | Attribute of FuzzyLikeThisQuery |
Ignore "Term Frequency" - a boost factor which rewards multiple occurences of the same term in a document
Possible values: true, false - Default value: false
<Field> | Child of FuzzyLikeThisQuery |
A field used in a FuzzyLikeThisQuery
<Field>'s attributes Name Values Default fieldName minSimilarity 0.5 prefixLength 1
@minSimilarity | Attribute of Field |
Controls the level of similarity required for fuzzy variants where 1 is identical and 0.5 is that the variant contains half of the original's characters in the same order. Lower values produce more results but may take longer to execute due to additional IO required to read matching document ids
Default value: 0.5
@prefixLength | Attribute of Field |
Controls the minimum number of characters at the start of fuzzy variant words that must exactly match the original. A value of zero will require no minimum and the search software will effectively scan ALL terms from a to z looking for variations. This can incur high CPU overhead and a prefix length of just "1" will reduce this overhead to 1/26th of the original cost (assuming an even distribution of letters used from the alphabet).
Default value: 1
@fieldName | Attribute of Field |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
<LikeThisQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
Cherry-picks "significant" terms from the example child text and queries using these words. By only using significant (read: rare) terms the performance cost of the query is substantially reduced and large bodies of text can be used as example content.
Example: Use a block of text as an example of the type of content to be found, ignoring the "Reuters" word which appears commonly in the index.
<LikeThisQuery percentTermsToMatch="5" stopWords="Reuters"> IRAQI TROOPS REPORTED PUSHING BACK IRANIANS Iraq said today its troops were pushing Iranian forces out of positions they had initially occupied when they launched a new offensive near the southern port of Basra early yesterday. A High Command communique said Iraqi troops had won a significant victory and were continuing to advance. Iraq said it had foiled a three-pronged thrust some 10 km (six miles) from Basra, but admitted the Iranians had occupied ground held by the Mohammed al-Qassem unit, one of three divisions attacked. The communique said Iranian Revolutionary Guards were under assault from warplanes, helicopter gunships, heavy artillery and tanks. "Our forces are continuing their advance until they purge the last foothold" occupied by the Iranians, it said. (Iran said its troops had killed or wounded more than 4,000 Iraqis and were stabilising their new positions.) The Baghdad communique said Iraqi planes also destroyed oil installations at Iran's southwestern Ahvaz field during a raid today. It denied an Iranian report that an Iraqi jet was shot down. Iraq also reported a naval battle at the northern tip of the Gulf. Iraqi naval units and forces defending an offshore terminal sank six Iranian out of 28 Iranian boats attempting to attack an offshore terminal, the communique said. Reuters 3; </LikeThisQuery>
<LikeThisQuery>'s attributes Name Values Default boost 1.0 fieldNames maxQueryTerms 20 minTermFrequency 1 percentTermsToMatch 30 stopWords
@boost | Attribute of LikeThisQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldNames | Attribute of LikeThisQuery |
Comma delimited list of field names
@stopWords | Attribute of LikeThisQuery |
a list of stop words - analyzed to produce stop terms
@maxQueryTerms | Attribute of LikeThisQuery |
controls the maximum number of words shortlisted for the query. The higher the number the slower the response due to more disk reads required
Default value: 20
@minTermFrequency | Attribute of LikeThisQuery |
Controls how many times a term must appear in the example text before it is shortlisted for use in the query
Default value: 1
@percentTermsToMatch | Attribute of LikeThisQuery |
A quality control that can be used to limit the number of results to those documents matching a certain percentage of the shortlisted query terms. Values must be between 1 and 100
Default value: 30
<BoostingQuery> | Child of BoostQuery, Clause, CachedFilter, Query |
Requires matches on the "Query" element and optionally boosts by any matches on the "BoostQuery". Unlike a regular BooleanQuery the boost can be less than 1 to produce a subtractive rather than additive result on the match score.
Example: Find documents about banks, preferably related to mergers, and preferably not about "World bank"
<BoostingQuery> <Query> <BooleanQuery fieldName="contents"> <Clause occurs="should"> <TermQuery>merger</TermQuery> </Clause> <Clause occurs="must"> <TermQuery>bank</TermQuery> </Clause> </BooleanQuery> </Query> <BoostQuery boost="0.01"> <UserQuery>"world bank"</UserQuery> </BoostQuery> </BoostingQuery>
Element's model:
<BoostingQuery>'s children Name Cardinality BoostQuery Only one Query Only one
<BoostingQuery>'s attributes Name Values Default boost 1.0
(Query, BoostQuery)
@boost | Attribute of BoostingQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<BoostQuery> | Child of BoostingQuery |
Child element of BoostingQuery used to contain the choice of Query which is used for boosting purposes
Element's model:
<BoostQuery>'s children Name Cardinality BooleanQuery One or none BoostingQuery One or none BoostingTermQuery One or none ConstantScoreQuery One or none FilteredQuery One or none FuzzyLikeThisQuery One or none LikeThisQuery One or none MatchAllDocsQuery One or none NumericRangeQuery One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsQuery One or none UserQuery One or none
<BoostQuery>'s attributes Name Values Default boost 1.0
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery)
@boost | Attribute of BoostQuery |
Optional boost for matches on this query. A boost of >0 but <1 effectively demotes results from Query that match this BoostQuery.
Default value: 1.0
<DuplicateFilter/> | Child of Clause, Filter, CachedFilter, ConstantScoreQuery |
Removes duplicated documents from results where "duplicate" means documents share a value for a particular field such as a primary key
Example: Find the latest version of each web page that mentions "Lucene"
<FilteredQuery> <Query> <TermQuery fieldName="text">lucene</TermQuery> </Query> <Filter> <DuplicateFilter fieldName="url" keepMode="last"/> </Filter> </FilteredQuery>
<DuplicateFilter>'s attributes Name Values Default fieldName keepMode first, last first processingMode full, fast full
This element is always empty.
@fieldName | Attribute of DuplicateFilter |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@keepMode | Attribute of DuplicateFilter |
Determines if the first or last document occurence is the one to return when presented with duplicated field values
Possible values: first, last - Default value: first
@processingMode | Attribute of DuplicateFilter |
Controls the choice of process used to produce the filter - "full" mode identifies only non-duplicate documents with the chosen field while "fast" mode may perform faster but will also mark documents without the field as valid. The former approach starts by assuming every document is a duplicate then finds the "master" documents to keep while the latter approach assumes all documents are unique and unmarks those documents that are a copy.
Possible values: full, fast - Default value: full
<TermsFilter> | Child of Clause, Filter, CachedFilter, ConstantScoreQuery |
Processes child text using a field-specific choice of Analyzer to produce a set of terms that are then used as a filter.
Example: Find documents talking about Lucene written on a Monday or a Friday
<FilteredQuery> <Query> <TermQuery fieldName="text">lucene</TermQuery> </Query> <Filter> <TermsFilter fieldName="dayOfWeek">monday friday</TermsFilter> </Filter> </FilteredQuery>
<TermsFilter>'s attributes Name Values Default fieldName
@fieldName | Attribute of TermsFilter |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
<BooleanFilter> | Child of Clause, Filter, CachedFilter, ConstantScoreQuery |
A Filter equivalent to BooleanQuery that applies Boolean logic to Clauses containing Filters. Unlike BooleanQuery a BooleanFilter can contain a single "mustNot" clause.
Example: Find documents from the first quarter of this year or last year that are not in "draft" status
<FilteredQuery> <Query> <MatchAllDocsQuery/> </Query> <Filter> <BooleanFilter> <Clause occurs="should"> <RangeFilter fieldName="date" lowerTerm="20070101" upperTerm="20070401"/> </Clause> <Clause occurs="should"> <RangeFilter fieldName="date" lowerTerm="20060101" upperTerm="20060401"/> </Clause> <Clause occurs="mustNot"> <TermsFilter fieldName="status">draft</TermsFilter> </Clause> </BooleanFilter> </Filter> </FilteredQuery>
Element's model:
<BooleanFilter>'s children Name Cardinality Clause At least one
(Clause)+