org.apache.nutch.scoring.webgraph
Class WebGraph.OutlinkDb

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
All Implemented Interfaces:
Closeable, Configurable, JobConfigurable, Mapper<Text,Writable,Text,NutchWritable>, Reducer<Text,NutchWritable,Text,LinkDatum>
Enclosing class:
WebGraph

public static class WebGraph.OutlinkDb
extends Configured
implements Mapper<Text,Writable,Text,NutchWritable>, Reducer<Text,NutchWritable,Text,LinkDatum>

The OutlinkDb creates a database of all outlinks. Outlinks to internal urls by domain and host can be ignored. The number of Outlinks out to a given page or domain can also be limited.


Field Summary
static String URL_FILTERING
           
static String URL_NORMALIZING
           
 
Constructor Summary
WebGraph.OutlinkDb()
          Default constructor.
WebGraph.OutlinkDb(Configuration conf)
          Configurable constructor.
 
Method Summary
 void close()
           
 void configure(JobConf conf)
          Configures the OutlinkDb job.
 void map(Text key, Writable value, OutputCollector<Text,NutchWritable> output, Reporter reporter)
          Passes through existing LinkDatum objects from an existing OutlinkDb and maps out new LinkDatum objects from new crawls ParseData.
 void reduce(Text key, Iterator<NutchWritable> values, OutputCollector<Text,LinkDatum> output, Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

URL_NORMALIZING

public static final String URL_NORMALIZING
See Also:
Constant Field Values

URL_FILTERING

public static final String URL_FILTERING
See Also:
Constant Field Values
Constructor Detail

WebGraph.OutlinkDb

public WebGraph.OutlinkDb()
Default constructor.


WebGraph.OutlinkDb

public WebGraph.OutlinkDb(Configuration conf)
Configurable constructor.

Method Detail

configure

public void configure(JobConf conf)
Configures the OutlinkDb job. Sets up internal links and link limiting.

Specified by:
configure in interface JobConfigurable

map

public void map(Text key,
                Writable value,
                OutputCollector<Text,NutchWritable> output,
                Reporter reporter)
         throws IOException
Passes through existing LinkDatum objects from an existing OutlinkDb and maps out new LinkDatum objects from new crawls ParseData.

Specified by:
map in interface Mapper<Text,Writable,Text,NutchWritable>
Throws:
IOException

reduce

public void reduce(Text key,
                   Iterator<NutchWritable> values,
                   OutputCollector<Text,LinkDatum> output,
                   Reporter reporter)
            throws IOException
Specified by:
reduce in interface Reducer<Text,NutchWritable,Text,LinkDatum>
Throws:
IOException

close

public void close()
Specified by:
close in interface Closeable


Copyright © 2012 The Apache Software Foundation