public class PhrasesIdentificationComponent extends SearchComponent
QueryComponent
to identify
& score "phrases" found in the input string, based on shingles in indexed fields.
The most common way to use this component is in conjunction with field that use
ShingleFilterFactory
on both the index
and query
analyzers.
An example field type configuration would be something like this...
<fieldType name="phrases" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="7" outputUnigramsIfNoShingles="true" outputUnigrams="true"/> </analyzer> </fieldType>
...where the query
analyzer's maxShingleSize="7"
determines the maximum
possible phrase length that can be hueristically deduced, the index
analyzer's
maxShingleSize="3"
determines the accuracy of phrases identified. The large the
indexed maxShingleSize
the higher the accuracy. Both analyzers must include
minShingleSize="2" outputUnigrams="true"
.
With a field type like this, one or more fields can be specified (with weights) via a
phrases.fields
param to request that this component identify possible phrases in the
input q
param, or an alternative phrases.q
override param. The identified
phrases will include their scores relative each field specified, as well an overal weighted score based
on the field weights provided by the client. Higher score values indicate a greater confidence in the
Phrase.
NOTE: In a distributed request, this component uses a single phase (piggy backing on the
ShardRequest.PURPOSE_GET_TOP_IDS
generated by QueryComponent
if it is in use) to
collect all field & shingle stats. No "refinement" requests are used.
Modifier and Type | Class and Description |
---|---|
static class |
PhrasesIdentificationComponent.Phrase
Model the data known about a single (candidate) Phrase -- which may or may not be indexed
|
static class |
PhrasesIdentificationComponent.PhrasesContextData
Simple container for all request options and data this component needs to store in the Request Context
|
SolrInfoBean.Category, SolrInfoBean.Group
Modifier and Type | Field and Description |
---|---|
static String |
COMPONENT_NAME
Name, also used as a request param to identify whether the user query concerns this component
|
static String |
PHRASE_ANALYSIS_FIELD |
static String |
PHRASE_FIELDS |
static String |
PHRASE_INDEX_MAXLEN |
static String |
PHRASE_INPUT |
static String |
PHRASE_QUERY_MAXLEN |
static String |
PHRASE_SUMMARY_POST |
static String |
PHRASE_SUMMARY_PRE |
static int |
SHARD_PURPOSE
The only shard purpose that will cause this component to do work & return data during shard req
|
metricNames, registry, standard_components
Constructor and Description |
---|
PhrasesIdentificationComponent() |
Modifier and Type | Method and Description |
---|---|
int |
distributedProcess(ResponseBuilder rb)
Process for a distributed search.
|
void |
finishStage(ResponseBuilder rb)
Called after all responses have been received for this stage.
|
String |
getDescription()
Simple one or two line description
|
static int |
getMaxShingleSize(Analyzer analyzer)
Helper method, public for testing purposes only.
|
void |
prepare(ResponseBuilder rb)
Prepare the response.
|
void |
process(ResponseBuilder rb)
Process the request for this component
|
getCategory, getMetricNames, getMetricRegistry, getName, handleResponses, init, modifyRequest, setName
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getMetricsSnapshot, registerMetricName
public static final int SHARD_PURPOSE
public static final String COMPONENT_NAME
public static final String PHRASE_INPUT
public static final String PHRASE_FIELDS
public static final String PHRASE_ANALYSIS_FIELD
public static final String PHRASE_SUMMARY_PRE
public static final String PHRASE_SUMMARY_POST
public static final String PHRASE_INDEX_MAXLEN
public static final String PHRASE_QUERY_MAXLEN
public void prepare(ResponseBuilder rb) throws IOException
SearchComponent
SearchComponent.process(org.apache.solr.handler.component.ResponseBuilder)
method.
Called for every incoming request.
The place to do initialization that is request dependent.prepare
in class SearchComponent
rb
- The ResponseBuilder
IOException
- If there is a low-level I/O error.public int distributedProcess(ResponseBuilder rb)
SearchComponent
distributedProcess
in class SearchComponent
public void finishStage(ResponseBuilder rb)
SearchComponent
finishStage
in class SearchComponent
public void process(ResponseBuilder rb) throws IOException
SearchComponent
process
in class SearchComponent
rb
- The ResponseBuilder
IOException
- If there is a low-level I/O error.public String getDescription()
SolrInfoBean
getDescription
in interface SolrInfoBean
getDescription
in class SearchComponent
public static int getMaxShingleSize(Analyzer analyzer)
Given an analyzer, inspects it to determine if:
TokenizerChain
ShingleFilterFactory
If these these conditions are met, then this method returns the maxShingleSize
in effect for this analyzer, otherwise returns -1.
analyzer
- An analyzer inspectmaxShingleSize
if availableCopyright © 2000-2019 Apache Software Foundation. All Rights Reserved.