InjectorJob (apache-nutch 2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.crawl
Class InjectorJob

java.lang.Object
  org.apache.hadoop.conf.Configured
      org.apache.nutch.util.NutchTool
          org.apache.nutch.crawl.InjectorJob

All Implemented Interfaces:: Configurable, Tool

public class InjectorJob
extends NutchTool
implements Tool
extends NutchTool
implements Tool

This class takes a flat file of URLs and adds them to the of pages to be crawled. Useful for bootstrapping the system. The URL files contain one URL per line, optionally followed by custom metadata separated by tabs with the metadata key separated from the corresponding value by '='.
Note that some metadata keys are reserved :
- nutch.score : allows to set a custom score for a specific URL
- nutch.fetchInterval : allows to set a custom fetch interval for a specific URL
e.g. http://www.nutch.org/ \t nutch.score=10 \t nutch.fetchInterval=2592000 \t userType=open_source

Nested Class Summary
`static class`	`InjectorJob.UrlMapper`

Field Summary
`static org.slf4j.Logger`	`LOG`
`static String`	`nutchFetchIntervalMDName` metadata key reserved for setting a custom fetchInterval for a specific URL
`static String`	`nutchScoreMDName` metadata key reserved for setting a custom score for a specific URL

Fields inherited from class org.apache.nutch.util.NutchTool
`currentJob, currentJobNum, numJobs, results, status`

Constructor Summary
`InjectorJob()`
`InjectorJob(Configuration conf)`

Method Summary
`void`	`inject(Path urlDir)`
`static void`	`main(String[] args)`
`Map<String,Object>`	`run(Map<String,Object> args)` Runs the tool, using a map of arguments.
`int`	`run(String[] args)`

Methods inherited from class org.apache.nutch.util.NutchTool
`getProgress, getStatus, killJob, stopJob`

Methods inherited from class org.apache.hadoop.conf.Configured
`getConf, setConf`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface org.apache.hadoop.conf.Configurable
`getConf, setConf`

Field Detail

LOG

public static final org.slf4j.Logger LOG

nutchScoreMDName

public static String nutchScoreMDName

metadata key reserved for setting a custom score for a specific URL

nutchFetchIntervalMDName

public static String nutchFetchIntervalMDName

metadata key reserved for setting a custom fetchInterval for a specific URL

Constructor Detail

InjectorJob

public InjectorJob()

InjectorJob

public InjectorJob(Configuration conf)

Method Detail

run

public Map<String,Object> run(Map<String,Object> args)
                       throws Exception

Description copied from class: NutchTool

Runs the tool, using a map of arguments. May return results, or null.

Specified by:: run in class NutchTool

Throws:: Exception

inject

public void inject(Path urlDir)
            throws Exception

Throws:: Exception

run

public int run(String[] args)
        throws Exception

Specified by:: run in interface Tool

Throws:: Exception

main

public static void main(String[] args)
                 throws Exception

Throws:: Exception

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.crawl Class InjectorJob

LOG

nutchScoreMDName

nutchFetchIntervalMDName

InjectorJob

InjectorJob

run

inject

run

main

org.apache.nutch.crawl
Class InjectorJob