org.apache.nutch.fetcher
Class FetcherJob.FetcherMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<K1,V1,K2,V2>
org.apache.gora.mapreduce.GoraMapper<String,WebPage,org.apache.hadoop.io.IntWritable,FetchEntry>
org.apache.nutch.fetcher.FetcherJob.FetcherMapper
- Enclosing class:
- FetcherJob
public static class FetcherJob.FetcherMapper
- extends org.apache.gora.mapreduce.GoraMapper<String,WebPage,org.apache.hadoop.io.IntWritable,FetchEntry>
Mapper class for Fetcher.
This class reads the random integer written by GeneratorJob
as its key
while outputting the actual key and value arguments through a
FetchEntry
instance.
This approach (combined with the use of PartitionUrlByHost
) makes
sure that Fetcher is still polite while also randomizing the key order. If
one host has a huge number of URLs in your table while other hosts have
not, FetcherReducer
will not be stuck on one host but process URLs
from other hosts as well.
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper |
org.apache.hadoop.mapreduce.Mapper.Context |
Method Summary |
protected void |
map(String key,
WebPage page,
org.apache.hadoop.mapreduce.Mapper.Context context)
|
protected void |
setup(org.apache.hadoop.mapreduce.Mapper.Context context)
|
Methods inherited from class org.apache.gora.mapreduce.GoraMapper |
initMapperJob, initMapperJob, initMapperJob, initMapperJob, initMapperJob |
Methods inherited from class org.apache.hadoop.mapreduce.Mapper |
cleanup, run |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
FetcherJob.FetcherMapper
public FetcherJob.FetcherMapper()
setup
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
- Overrides:
setup
in class org.apache.hadoop.mapreduce.Mapper<String,WebPage,org.apache.hadoop.io.IntWritable,FetchEntry>
map
protected void map(String key,
WebPage page,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
- Overrides:
map
in class org.apache.hadoop.mapreduce.Mapper<String,WebPage,org.apache.hadoop.io.IntWritable,FetchEntry>
- Throws:
IOException
InterruptedException
Copyright © 2013 The Apache Software Foundation