org.apache.nutch.indexer.field
Class AnchorFields.Extractor

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.AnchorFields.Extractor
All Implemented Interfaces:
Closeable, Configurable, JobConfigurable, Mapper<Text,Writable,Text,ObjectWritable>, Reducer<Text,ObjectWritable,Text,LinkDatum>
Enclosing class:
AnchorFields

public static class AnchorFields.Extractor
extends Configured
implements Mapper<Text,Writable,Text,ObjectWritable>, Reducer<Text,ObjectWritable,Text,LinkDatum>

Extracts outlinks to be created as FieldWritable objects. Ignores empty and null anchors.


Constructor Summary
AnchorFields.Extractor()
          Default constructor.
AnchorFields.Extractor(Configuration conf)
          Configurable constructor.
 
Method Summary
 void close()
           
 void configure(JobConf conf)
          Configures the job, sets to ignore empty anchors.
 void map(Text key, Writable value, OutputCollector<Text,ObjectWritable> output, Reporter reporter)
          Wraps values in ObjectWritable
 void reduce(Text key, Iterator<ObjectWritable> values, OutputCollector<Text,LinkDatum> output, Reporter reporter)
          Extracts and inverts outlinks, ignores empty anchors.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AnchorFields.Extractor

public AnchorFields.Extractor()
Default constructor.


AnchorFields.Extractor

public AnchorFields.Extractor(Configuration conf)
Configurable constructor.

Method Detail

configure

public void configure(JobConf conf)
Configures the job, sets to ignore empty anchors.

Specified by:
configure in interface JobConfigurable

map

public void map(Text key,
                Writable value,
                OutputCollector<Text,ObjectWritable> output,
                Reporter reporter)
         throws IOException
Wraps values in ObjectWritable

Specified by:
map in interface Mapper<Text,Writable,Text,ObjectWritable>
Throws:
IOException

reduce

public void reduce(Text key,
                   Iterator<ObjectWritable> values,
                   OutputCollector<Text,LinkDatum> output,
                   Reporter reporter)
            throws IOException
Extracts and inverts outlinks, ignores empty anchors.

Specified by:
reduce in interface Reducer<Text,ObjectWritable,Text,LinkDatum>
Throws:
IOException

close

public void close()
Specified by:
close in interface Closeable


Copyright © 2006 The Apache Software Foundation