|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.scoring.opic.OPICScoringFilter
public class OPICScoringFilter
This plugin implements a variant of an Online Page Importance Computation (OPIC) score, described in this paper: Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003), Adaptive On-Line Page Importance Computation .
Field Summary |
---|
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter |
---|
X_POINT_ID |
Constructor Summary | |
---|---|
OPICScoringFilter()
|
Method Summary | |
---|---|
void |
distributeScoreToOutlinks(String fromUrl,
WebPage row,
Collection<ScoreDatum> scoreData,
int allCount)
Get cash on hand, divide it by the number of outlinks and apply. |
float |
generatorSortValue(String url,
WebPage row,
float initSort)
Use WebPage.getScore() . |
org.apache.hadoop.conf.Configuration |
getConf()
|
Collection<WebPage.Field> |
getFields()
|
float |
indexerScore(String url,
NutchDocument doc,
WebPage row,
float initScore)
Dampen the boost value by scorePower. |
void |
initialScore(String url,
WebPage row)
Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level. |
void |
injectedScore(String url,
WebPage row)
Set an initial score for newly injected pages. |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
|
void |
updateScore(String url,
WebPage row,
List<ScoreDatum> inlinkedScoreData)
Increase the score by a sum of inlinked scores. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public OPICScoringFilter()
Method Detail |
---|
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
public void setConf(org.apache.hadoop.conf.Configuration conf)
setConf
in interface org.apache.hadoop.conf.Configurable
public void injectedScore(String url, WebPage row) throws ScoringFilterException
ScoringFilter
injectedScore
in interface ScoringFilter
url
- url of the pagerow
- new page. Filters will modify it in-place.
ScoringFilterException
public void initialScore(String url, WebPage row) throws ScoringFilterException
initialScore
in interface ScoringFilter
url
- url of the page
ScoringFilterException
public float generatorSortValue(String url, WebPage row, float initSort) throws ScoringFilterException
WebPage.getScore()
.
generatorSortValue
in interface ScoringFilter
url
- url of the pageinitSort
- initial sort value, or a value from previous filters in chain
ScoringFilterException
public void updateScore(String url, WebPage row, List<ScoreDatum> inlinkedScoreData)
updateScore
in interface ScoringFilter
url
- url of the pagepublic void distributeScoreToOutlinks(String fromUrl, WebPage row, Collection<ScoreDatum> scoreData, int allCount)
distributeScoreToOutlinks
in interface ScoringFilter
fromUrl
- url of the source pagescoreData
- A list of OutlinkedScoreDatum
s for every outlink.
These OutlinkedScoreDatum
s will be passed to
#updateScore(String, OldWebTableRow, List)
for every outlinked URL.allCount
- number of all collected outlinks from the source pagepublic float indexerScore(String url, NutchDocument doc, WebPage row, float initScore)
indexerScore
in interface ScoringFilter
url
- url of the pagedoc
- document. NOTE: this already contains all information collected
by indexing filters. Implementations may modify this instance, in order to store/remove
some information.initScore
- initial boost value for the Lucene document.
public Collection<WebPage.Field> getFields()
getFields
in interface FieldPluggable
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |