|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.Similarity org.apache.lucene.search.DefaultSimilarity org.apache.lucene.misc.SweetSpotSimilarity
public class SweetSpotSimilarity
A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.
For lengthNorm, A global min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.
A per field min/max can be specified if different fields have different sweet spots.
For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.
Field Summary |
---|
Fields inherited from class org.apache.lucene.search.DefaultSimilarity |
---|
discountOverlaps |
Fields inherited from class org.apache.lucene.search.Similarity |
---|
NO_DOC_ID_PROVIDED |
Constructor Summary | |
---|---|
SweetSpotSimilarity()
|
Method Summary | |
---|---|
float |
baselineTf(float freq)
Implemented as:
(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0. |
float |
computeNorm(String fieldName,
org.apache.lucene.index.FieldInvertState state)
Implemented as state.getBoost() *
lengthNorm(fieldName, numTokens) where
numTokens does not count overlap tokens if
discountOverlaps is true by default or true for this
specific field. |
float |
hyperbolicTf(float freq)
Uses a hyperbolic tangent function that allows for a hard max... |
float |
lengthNorm(String fieldName,
int numTerms)
Implemented as:
1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
. |
void |
setBaselineTfFactors(float base,
float min)
Sets the baseline and minimum function variables for baselineTf |
void |
setHyperbolicTfFactors(float min,
float max,
double base,
float xoffset)
Sets the function variables for the hyperbolicTf functions |
void |
setLengthNormFactors(int min,
int max,
float steepness)
Sets the default function variables used by lengthNorm when no field specific variables have been set. |
void |
setLengthNormFactors(String field,
int min,
int max,
float steepness,
boolean discountOverlaps)
Sets the function variables used by lengthNorm for a specific named field. |
float |
tf(int freq)
Delegates to baselineTf |
Methods inherited from class org.apache.lucene.search.DefaultSimilarity |
---|
coord, getDiscountOverlaps, idf, queryNorm, setDiscountOverlaps, sloppyFreq, tf |
Methods inherited from class org.apache.lucene.search.Similarity |
---|
decodeNorm, encodeNorm, getDefault, getNormDecoder, idf, idf, idfExplain, idfExplain, scorePayload, scorePayload, setDefault |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SweetSpotSimilarity()
Method Detail |
---|
public void setBaselineTfFactors(float base, float min)
baselineTf(float)
public void setHyperbolicTfFactors(float min, float max, double base, float xoffset)
min
- the minimum tf value to ever be returned (default: 0.0)max
- the maximum tf value to ever be returned (default: 2.0)base
- the base value to be used in the exponential for the hyperbolic function (default: e)xoffset
- the midpoint of the hyperbolic function (default: 10.0)hyperbolicTf(float)
public void setLengthNormFactors(int min, int max, float steepness)
lengthNorm(java.lang.String, int)
public void setLengthNormFactors(String field, int min, int max, float steepness, boolean discountOverlaps)
field
- field namemin
- minimum valuemax
- maximum valuesteepness
- steepness of the curvediscountOverlaps
- if true, numOverlapTokens
will be
subtracted from numTokens
; if false then
numOverlapTokens
will be assumed to be 0 (see
DefaultSimilarity.computeNorm(String, FieldInvertState)
for details).lengthNorm(java.lang.String, int)
public float computeNorm(String fieldName, org.apache.lucene.index.FieldInvertState state)
state.getBoost() *
lengthNorm(fieldName, numTokens)
where
numTokens does not count overlap tokens if
discountOverlaps is true by default or true for this
specific field.
computeNorm
in class org.apache.lucene.search.DefaultSimilarity
public float lengthNorm(String fieldName, int numTerms)
1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.
This degrades to 1/sqrt(x)
when min and max are both 1 and
steepness is 0.5
:TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.
lengthNorm
in class org.apache.lucene.search.DefaultSimilarity
setLengthNormFactors(int, int, float)
public float tf(int freq)
tf
in class org.apache.lucene.search.Similarity
baselineTf(float)
public float baselineTf(float freq)
(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.
This degrates to sqrt(x)
when min and base are both 0
setBaselineTfFactors(float, float)
public float hyperbolicTf(float freq)
tf(x)=min+(max-min)/2*(((base**(x-xoffset)-base**-(x-xoffset))/(base**(x-xoffset)+base**-(x-xoffset)))+1)
This code is provided as a convenience for subclasses that want to use a hyperbolic tf function.
setHyperbolicTfFactors(float, float, double, float)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |