org.apache.nutch.scoring.webgraph
Class ScoreUpdater

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.scoring.webgraph.ScoreUpdater
All Implemented Interfaces:
Closeable, Configurable, JobConfigurable, Mapper<Text,Writable,Text,ObjectWritable>, Reducer<Text,ObjectWritable,Text,CrawlDatum>, Tool

public class ScoreUpdater
extends Configured
implements Tool, Mapper<Text,Writable,Text,ObjectWritable>, Reducer<Text,ObjectWritable,Text,CrawlDatum>

Updates the score from the WebGraph node database into the crawl database. Any score that is not in the node database is set to the clear score in the crawl database.


Field Summary
static org.slf4j.Logger LOG
           
 
Constructor Summary
ScoreUpdater()
           
 
Method Summary
 void close()
           
 void configure(JobConf conf)
           
static void main(String[] args)
           
 void map(Text key, Writable value, OutputCollector<Text,ObjectWritable> output, Reporter reporter)
          Changes input into ObjectWritables.
 void reduce(Text key, Iterator<ObjectWritable> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
          Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score.
 int run(String[] args)
          Runs the ScoreUpdater tool.
 void update(Path crawlDb, Path webGraphDb)
          Updates the inlink score in the web graph node databsae into the crawl database.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

ScoreUpdater

public ScoreUpdater()
Method Detail

configure

public void configure(JobConf conf)
Specified by:
configure in interface JobConfigurable

map

public void map(Text key,
                Writable value,
                OutputCollector<Text,ObjectWritable> output,
                Reporter reporter)
         throws IOException
Changes input into ObjectWritables.

Specified by:
map in interface Mapper<Text,Writable,Text,ObjectWritable>
Throws:
IOException

reduce

public void reduce(Text key,
                   Iterator<ObjectWritable> values,
                   OutputCollector<Text,CrawlDatum> output,
                   Reporter reporter)
            throws IOException
Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score.

Specified by:
reduce in interface Reducer<Text,ObjectWritable,Text,CrawlDatum>
Throws:
IOException

close

public void close()
Specified by:
close in interface Closeable

update

public void update(Path crawlDb,
                   Path webGraphDb)
            throws IOException
Updates the inlink score in the web graph node databsae into the crawl database.

Parameters:
crawlDb - The crawl database to update
webGraphDb - The webgraph database to use.
Throws:
IOException - If an error occurs while updating the scores.

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Runs the ScoreUpdater tool.

Specified by:
run in interface Tool
Throws:
Exception


Copyright © 2011 The Apache Software Foundation