org.apache.nutch.fetcher
Class FetcherJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.util.NutchTool
org.apache.nutch.fetcher.FetcherJob
- All Implemented Interfaces:
- Configurable, Tool
public class FetcherJob
- extends NutchTool
- implements Tool
Multi-threaded fetcher.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PROTOCOL_REDIR
public static final String PROTOCOL_REDIR
- See Also:
- Constant Field Values
PERM_REFRESH_TIME
public static final int PERM_REFRESH_TIME
- See Also:
- Constant Field Values
REDIRECT_DISCOVERED
public static final org.apache.avro.util.Utf8 REDIRECT_DISCOVERED
RESUME_KEY
public static final String RESUME_KEY
- See Also:
- Constant Field Values
PARSE_KEY
public static final String PARSE_KEY
- See Also:
- Constant Field Values
THREADS_KEY
public static final String THREADS_KEY
- See Also:
- Constant Field Values
LOG
public static final org.slf4j.Logger LOG
FetcherJob
public FetcherJob()
FetcherJob
public FetcherJob(Configuration conf)
getFields
public Collection<WebPage.Field> getFields(Job job)
run
public Map<String,Object> run(Map<String,Object> args)
throws Exception
- Description copied from class:
NutchTool
- Runs the tool, using a map of arguments.
May return results, or null.
- Specified by:
run
in class NutchTool
- Throws:
Exception
fetch
public int fetch(String batchId,
int threads,
boolean shouldResume,
int numTasks)
throws Exception
- Run fetcher.
- Parameters:
batchId
- batchId (obtained from Generator) or null to fetch all generated fetchliststhreads
- number of threads per map taskshouldResume
- numTasks
- number of fetching tasks (reducers). If set to < 1 then use the default,
which is mapred.map.tasks.
- Returns:
- 0 on success
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Specified by:
run
in interface Tool
- Throws:
Exception
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2012 The Apache Software Foundation