org.apache.nutch.fetcher
Class FetcherJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.util.NutchTool
          extended by org.apache.nutch.fetcher.FetcherJob
All Implemented Interfaces:
Configurable, Tool

public class FetcherJob
extends NutchTool
implements Tool

Multi-threaded fetcher.


Nested Class Summary
static class FetcherJob.FetcherMapper
           Mapper class for Fetcher.
 
Field Summary
static org.slf4j.Logger LOG
           
static String PARSE_KEY
           
static int PERM_REFRESH_TIME
           
static String PROTOCOL_REDIR
           
static org.apache.avro.util.Utf8 REDIRECT_DISCOVERED
           
static String RESUME_KEY
           
static String THREADS_KEY
           
 
Fields inherited from class org.apache.nutch.util.NutchTool
currentJob, currentJobNum, numJobs, results, status
 
Constructor Summary
FetcherJob()
           
FetcherJob(Configuration conf)
           
 
Method Summary
 int fetch(String batchId, int threads, boolean shouldResume, int numTasks)
          Run fetcher.
 Collection<WebPage.Field> getFields(Job job)
           
static void main(String[] args)
           
 Map<String,Object> run(Map<String,Object> args)
          Runs the tool, using a map of arguments.
 int run(String[] args)
           
 
Methods inherited from class org.apache.nutch.util.NutchTool
getProgress, getStatus, killJob, stopJob
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

PROTOCOL_REDIR

public static final String PROTOCOL_REDIR
See Also:
Constant Field Values

PERM_REFRESH_TIME

public static final int PERM_REFRESH_TIME
See Also:
Constant Field Values

REDIRECT_DISCOVERED

public static final org.apache.avro.util.Utf8 REDIRECT_DISCOVERED

RESUME_KEY

public static final String RESUME_KEY
See Also:
Constant Field Values

PARSE_KEY

public static final String PARSE_KEY
See Also:
Constant Field Values

THREADS_KEY

public static final String THREADS_KEY
See Also:
Constant Field Values

LOG

public static final org.slf4j.Logger LOG
Constructor Detail

FetcherJob

public FetcherJob()

FetcherJob

public FetcherJob(Configuration conf)
Method Detail

getFields

public Collection<WebPage.Field> getFields(Job job)

run

public Map<String,Object> run(Map<String,Object> args)
                       throws Exception
Description copied from class: NutchTool
Runs the tool, using a map of arguments. May return results, or null.

Specified by:
run in class NutchTool
Throws:
Exception

fetch

public int fetch(String batchId,
                 int threads,
                 boolean shouldResume,
                 int numTasks)
          throws Exception
Run fetcher.

Parameters:
batchId - batchId (obtained from Generator) or null to fetch all generated fetchlists
threads - number of threads per map task
shouldResume -
numTasks - number of fetching tasks (reducers). If set to < 1 then use the default, which is mapred.map.tasks.
Returns:
0 on success
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Specified by:
run in interface Tool
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2012 The Apache Software Foundation