org.apache.nutch.scoring.webgraph
Class WebGraph.OutlinkDb
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
- All Implemented Interfaces:
- Closeable, Configurable, JobConfigurable, Mapper<Text,Writable,Text,NutchWritable>, Reducer<Text,NutchWritable,Text,LinkDatum>
- Enclosing class:
- WebGraph
public static class WebGraph.OutlinkDb
- extends Configured
- implements Mapper<Text,Writable,Text,NutchWritable>, Reducer<Text,NutchWritable,Text,LinkDatum>
The OutlinkDb creates a database of all outlinks. Outlinks to internal urls
by domain and host can be ignored. The number of Outlinks out to a given
page or domain can also be limited.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
URL_NORMALIZING
public static final String URL_NORMALIZING
- See Also:
- Constant Field Values
URL_FILTERING
public static final String URL_FILTERING
- See Also:
- Constant Field Values
WebGraph.OutlinkDb
public WebGraph.OutlinkDb()
- Default constructor.
WebGraph.OutlinkDb
public WebGraph.OutlinkDb(Configuration conf)
- Configurable constructor.
configure
public void configure(JobConf conf)
- Configures the OutlinkDb job. Sets up internal links and link limiting.
- Specified by:
configure
in interface JobConfigurable
map
public void map(Text key,
Writable value,
OutputCollector<Text,NutchWritable> output,
Reporter reporter)
throws IOException
- Passes through existing LinkDatum objects from an existing OutlinkDb and
maps out new LinkDatum objects from new crawls ParseData.
- Specified by:
map
in interface Mapper<Text,Writable,Text,NutchWritable>
- Throws:
IOException
reduce
public void reduce(Text key,
Iterator<NutchWritable> values,
OutputCollector<Text,LinkDatum> output,
Reporter reporter)
throws IOException
- Specified by:
reduce
in interface Reducer<Text,NutchWritable,Text,LinkDatum>
- Throws:
IOException
close
public void close()
- Specified by:
close
in interface Closeable
Copyright © 2012 The Apache Software Foundation