org.apache.nutch.scoring.opic
Class OPICScoringFilter

java.lang.Object
  extended by org.apache.nutch.scoring.opic.OPICScoringFilter
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, FieldPluggable, Pluggable, ScoringFilter

public class OPICScoringFilter
extends Object
implements ScoringFilter

This plugin implements a variant of an Online Page Importance Computation (OPIC) score, described in this paper: Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003), Adaptive On-Line Page Importance Computation .

Author:
Andrzej Bialecki

Field Summary
 
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
 
Constructor Summary
OPICScoringFilter()
           
 
Method Summary
 void distributeScoreToOutlinks(String fromUrl, WebPage row, Collection<ScoreDatum> scoreData, int allCount)
          Get cash on hand, divide it by the number of outlinks and apply.
 float generatorSortValue(String url, WebPage row, float initSort)
          Use WebPage.getScore().
 org.apache.hadoop.conf.Configuration getConf()
           
 Collection<WebPage.Field> getFields()
           
 float indexerScore(String url, NutchDocument doc, WebPage row, float initScore)
          Dampen the boost value by scorePower.
 void initialScore(String url, WebPage row)
          Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level.
 void injectedScore(String url, WebPage row)
          Set an initial score for newly injected pages.
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
 void updateScore(String url, WebPage row, List<ScoreDatum> inlinkedScoreData)
          Increase the score by a sum of inlinked scores.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OPICScoringFilter

public OPICScoringFilter()
Method Detail

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

injectedScore

public void injectedScore(String url,
                          WebPage row)
                   throws ScoringFilterException
Description copied from interface: ScoringFilter
Set an initial score for newly injected pages. Note: newly injected pages may have no inlinks, so filter implementations may wish to set this score to a non-zero value, to give newly injected pages some initial credit.

Specified by:
injectedScore in interface ScoringFilter
Parameters:
url - url of the page
row - new page. Filters will modify it in-place.
Throws:
ScoringFilterException

initialScore

public void initialScore(String url,
                         WebPage row)
                  throws ScoringFilterException
Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level. Newly discovered pages have at least one inlink.

Specified by:
initialScore in interface ScoringFilter
Parameters:
url - url of the page
Throws:
ScoringFilterException

generatorSortValue

public float generatorSortValue(String url,
                                WebPage row,
                                float initSort)
                         throws ScoringFilterException
Use WebPage.getScore().

Specified by:
generatorSortValue in interface ScoringFilter
Parameters:
url - url of the page
initSort - initial sort value, or a value from previous filters in chain
Throws:
ScoringFilterException

updateScore

public void updateScore(String url,
                        WebPage row,
                        List<ScoreDatum> inlinkedScoreData)
Increase the score by a sum of inlinked scores.

Specified by:
updateScore in interface ScoringFilter
Parameters:
url - url of the page

distributeScoreToOutlinks

public void distributeScoreToOutlinks(String fromUrl,
                                      WebPage row,
                                      Collection<ScoreDatum> scoreData,
                                      int allCount)
Get cash on hand, divide it by the number of outlinks and apply.

Specified by:
distributeScoreToOutlinks in interface ScoringFilter
Parameters:
fromUrl - url of the source page
scoreData - A list of OutlinkedScoreDatums for every outlink. These OutlinkedScoreDatums will be passed to #updateScore(String, OldWebTableRow, List) for every outlinked URL.
allCount - number of all collected outlinks from the source page

indexerScore

public float indexerScore(String url,
                          NutchDocument doc,
                          WebPage row,
                          float initScore)
Dampen the boost value by scorePower.

Specified by:
indexerScore in interface ScoringFilter
Parameters:
url - url of the page
doc - document. NOTE: this already contains all information collected by indexing filters. Implementations may modify this instance, in order to store/remove some information.
initScore - initial boost value for the Lucene document.
Returns:
boost value for the Lucene document. This value is passed as an argument to the next scoring filter in chain. NOTE: implementations may also express other scoring strategies by modifying Lucene document directly.

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in interface FieldPluggable


Copyright © 2013 The Apache Software Foundation