org.apache.nutch.fetcher
Class FetcherJob.FetcherMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>
org.apache.gora.mapreduce.GoraMapper<String,WebPage,IntWritable,FetchEntry>
org.apache.nutch.fetcher.FetcherJob.FetcherMapper
- Enclosing class:
- FetcherJob
public static class FetcherJob.FetcherMapper
- extends org.apache.gora.mapreduce.GoraMapper<String,WebPage,IntWritable,FetchEntry>
Mapper class for Fetcher.
This class reads the random integer written by GeneratorJob
as its key
while outputting the actual key and value arguments through a
FetchEntry
instance.
This approach (combined with the use of PartitionUrlByHost
) makes
sure that Fetcher is still polite while also randomizing the key order. If
one host has a huge number of URLs in your table while other hosts have
not, FetcherReducer
will not be stuck on one host but process URLs
from other hosts as well.
Methods inherited from class org.apache.gora.mapreduce.GoraMapper |
initMapperJob, initMapperJob, initMapperJob, initMapperJob, initMapperJob |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
FetcherJob.FetcherMapper
public FetcherJob.FetcherMapper()
setup
protected void setup(Mapper.Context context)
- Overrides:
setup
in class Mapper<String,WebPage,IntWritable,FetchEntry>
map
protected void map(String key,
WebPage page,
Mapper.Context context)
throws IOException,
InterruptedException
- Overrides:
map
in class Mapper<String,WebPage,IntWritable,FetchEntry>
- Throws:
IOException
InterruptedException
Copyright © 2012 The Apache Software Foundation