|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface ScoringFilter
A contract defining behavior of scoring plugins. A scoring filter will manipulate scoring variables in CrawlDatum and in resulting search indexes. Filters can be chained in a specific order, to provide multi-stage scoring adjustments.
Field Summary | |
---|---|
static String |
X_POINT_ID
The name of the extension point. |
Method Summary | |
---|---|
void |
distributeScoreToOutlinks(String fromUrl,
WebPage page,
Collection<ScoreDatum> scoreData,
int allCount)
Distribute score value from the current page to all its outlinked pages. |
float |
generatorSortValue(String url,
WebPage page,
float initSort)
This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation. |
float |
indexerScore(String url,
NutchDocument doc,
WebPage page,
float initScore)
This method calculates a Lucene document boost. |
void |
initialScore(String url,
WebPage page)
Set an initial score for newly discovered pages. |
void |
injectedScore(String url,
WebPage page)
Set an initial score for newly injected pages. |
void |
updateScore(String url,
WebPage page,
List<ScoreDatum> inlinkedScoreData)
This method calculates a new score during table update, based on the values contributed by inlinked pages. |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
---|
getConf, setConf |
Methods inherited from interface org.apache.nutch.plugin.FieldPluggable |
---|
getFields |
Field Detail |
---|
static final String X_POINT_ID
Method Detail |
---|
void injectedScore(String url, WebPage page) throws ScoringFilterException
url
- url of the pagepage
- new page. Filters will modify it in-place.
ScoringFilterException
void initialScore(String url, WebPage page) throws ScoringFilterException
url
- url of the pagepage
-
ScoringFilterException
float generatorSortValue(String url, WebPage page, float initSort) throws ScoringFilterException
url
- url of the pagedatum
- page row. Modifications will be persisted.initSort
- initial sort value, or a value from previous filters in chain
ScoringFilterException
void distributeScoreToOutlinks(String fromUrl, WebPage page, Collection<ScoreDatum> scoreData, int allCount) throws ScoringFilterException
fromUrl
- url of the source pagerow
- page rowscoreData
- A list of OutlinkedScoreDatum
s for every outlink.
These OutlinkedScoreDatum
s will be passed to
#updateScore(String, OldWebTableRow, List)
for every outlinked URL.allCount
- number of all collected outlinks from the source page
ScoringFilterException
void updateScore(String url, WebPage page, List<ScoreDatum> inlinkedScoreData) throws ScoringFilterException
url
- url of the pagepage
- inlinked
- list of OutlinkedScoreDatum
s for all inlinks pointing to this URL.
ScoringFilterException
float indexerScore(String url, NutchDocument doc, WebPage page, float initScore) throws ScoringFilterException
url
- url of the pagedoc
- document. NOTE: this already contains all information collected
by indexing filters. Implementations may modify this instance, in order to store/remove
some information.row
- page rowinitScore
- initial boost value for the Lucene document.
ScoringFilterException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |