org.apache.nutch.crawl
Class Generator.CrawlDbUpdater

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by org.apache.nutch.crawl.Generator.CrawlDbUpdater
All Implemented Interfaces:
Closeable, JobConfigurable, Mapper<Text,CrawlDatum,Text,CrawlDatum>, Reducer<Text,CrawlDatum,Text,CrawlDatum>
Enclosing class:
Generator

public static class Generator.CrawlDbUpdater
extends MapReduceBase
implements Mapper<Text,CrawlDatum,Text,CrawlDatum>, Reducer<Text,CrawlDatum,Text,CrawlDatum>

Update the CrawlDB so that the next generate won't include the same URLs.


Constructor Summary
Generator.CrawlDbUpdater()
           
 
Method Summary
 void configure(JobConf job)
           
 void map(Text key, CrawlDatum value, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 void reduce(Text key, Iterator<CrawlDatum> values, OutputCollector<Text,CrawlDatum> output, Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.io.Closeable
close
 

Constructor Detail

Generator.CrawlDbUpdater

public Generator.CrawlDbUpdater()
Method Detail

configure

public void configure(JobConf job)
Specified by:
configure in interface JobConfigurable
Overrides:
configure in class MapReduceBase

map

public void map(Text key,
                CrawlDatum value,
                OutputCollector<Text,CrawlDatum> output,
                Reporter reporter)
         throws IOException
Specified by:
map in interface Mapper<Text,CrawlDatum,Text,CrawlDatum>
Throws:
IOException

reduce

public void reduce(Text key,
                   Iterator<CrawlDatum> values,
                   OutputCollector<Text,CrawlDatum> output,
                   Reporter reporter)
            throws IOException
Specified by:
reduce in interface Reducer<Text,CrawlDatum,Text,CrawlDatum>
Throws:
IOException


Copyright © 2006 The Apache Software Foundation