org.apache.nutch.scoring
Class ScoringFilters

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.scoring.ScoringFilters
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, FieldPluggable, Pluggable, ScoringFilter

public class ScoringFilters
extends org.apache.hadoop.conf.Configured
implements ScoringFilter

Creates and caches ScoringFilter implementing plugins.

Author:
Andrzej Bialecki

Field Summary
 
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
 
Constructor Summary
ScoringFilters(org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 void distributeScoreToOutlinks(String fromUrl, WebPage row, Collection<ScoreDatum> scoreData, int allCount)
          Distribute score value from the current page to all its outlinked pages.
 float generatorSortValue(String url, WebPage row, float initSort)
          Calculate a sort value for Generate.
 Collection<WebPage.Field> getFields()
           
 float indexerScore(String url, NutchDocument doc, WebPage row, float initScore)
          This method calculates a Lucene document boost.
 void initialScore(String url, WebPage row)
          Calculate a new initial score, used when adding newly discovered pages.
 void injectedScore(String url, WebPage row)
          Calculate a new initial score, used when injecting new pages.
 void updateScore(String url, WebPage row, List<ScoreDatum> inlinkedScoreData)
          This method calculates a new score during table update, based on the values contributed by inlinked pages.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

ScoringFilters

public ScoringFilters(org.apache.hadoop.conf.Configuration conf)
Method Detail

generatorSortValue

public float generatorSortValue(String url,
                                WebPage row,
                                float initSort)
                         throws ScoringFilterException
Calculate a sort value for Generate.

Specified by:
generatorSortValue in interface ScoringFilter
Parameters:
url - url of the page
initSort - initial sort value, or a value from previous filters in chain
Throws:
ScoringFilterException

initialScore

public void initialScore(String url,
                         WebPage row)
                  throws ScoringFilterException
Calculate a new initial score, used when adding newly discovered pages.

Specified by:
initialScore in interface ScoringFilter
Parameters:
url - url of the page
Throws:
ScoringFilterException

injectedScore

public void injectedScore(String url,
                          WebPage row)
                   throws ScoringFilterException
Calculate a new initial score, used when injecting new pages.

Specified by:
injectedScore in interface ScoringFilter
Parameters:
url - url of the page
row - new page. Filters will modify it in-place.
Throws:
ScoringFilterException

distributeScoreToOutlinks

public void distributeScoreToOutlinks(String fromUrl,
                                      WebPage row,
                                      Collection<ScoreDatum> scoreData,
                                      int allCount)
                               throws ScoringFilterException
Description copied from interface: ScoringFilter
Distribute score value from the current page to all its outlinked pages.

Specified by:
distributeScoreToOutlinks in interface ScoringFilter
Parameters:
fromUrl - url of the source page
scoreData - A list of OutlinkedScoreDatums for every outlink. These OutlinkedScoreDatums will be passed to #updateScore(String, OldWebTableRow, List) for every outlinked URL.
allCount - number of all collected outlinks from the source page
Throws:
ScoringFilterException

updateScore

public void updateScore(String url,
                        WebPage row,
                        List<ScoreDatum> inlinkedScoreData)
                 throws ScoringFilterException
Description copied from interface: ScoringFilter
This method calculates a new score during table update, based on the values contributed by inlinked pages.

Specified by:
updateScore in interface ScoringFilter
Parameters:
url - url of the page
Throws:
ScoringFilterException

indexerScore

public float indexerScore(String url,
                          NutchDocument doc,
                          WebPage row,
                          float initScore)
                   throws ScoringFilterException
Description copied from interface: ScoringFilter
This method calculates a Lucene document boost.

Specified by:
indexerScore in interface ScoringFilter
Parameters:
url - url of the page
doc - document. NOTE: this already contains all information collected by indexing filters. Implementations may modify this instance, in order to store/remove some information.
initScore - initial boost value for the Lucene document.
Returns:
boost value for the Lucene document. This value is passed as an argument to the next scoring filter in chain. NOTE: implementations may also express other scoring strategies by modifying Lucene document directly.
Throws:
ScoringFilterException

getFields

public Collection<WebPage.Field> getFields()
Specified by:
getFields in interface FieldPluggable


Copyright © 2013 The Apache Software Foundation