org.apache.nutch.indexer.field
Class AnchorFields

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.AnchorFields
All Implemented Interfaces:
Configurable, Tool

public class AnchorFields
extends Configured
implements Tool

Creates FieldWritable objects for inbound anchor text. These FieldWritable objects are then included in the input to the FieldIndexer to be converted to Lucene Field objects and indexed. Any empty or null anchor text is ignored. Anchors are sorted in descending order according to the score of their parent pages. There are settings for a maximum number of anchors to index and whether those anchors should be stored and tokenized. With a descending order by score and a maximum anchors index we ensure that only the best anchors are indexed assuming that a higher link analysis score equals a better page and better inbound text.


Nested Class Summary
static class AnchorFields.Collector
          Collects and creates FieldWritable objects from the inlinks.
static class AnchorFields.Extractor
          Extracts outlinks to be created as FieldWritable objects.
 
Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
AnchorFields()
           
 
Method Summary
 void createFields(Path webGraphDb, Path basicFields, Path output)
          Creates the FieldsWritable object from the anchors.
static void main(String[] args)
           
 int run(String[] args)
          Runs the AnchorFields job.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

AnchorFields

public AnchorFields()
Method Detail

createFields

public void createFields(Path webGraphDb,
                         Path basicFields,
                         Path output)
                  throws IOException
Creates the FieldsWritable object from the anchors.

Parameters:
webGraphDb - The WebGraph from which to pull outlinks.
basicFields - The BasicFields that must be present to avoid orphan anchor fields.
output - The AnchorFields output.
Throws:
IOException - If an error occurs while creating the fields.

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Runs the AnchorFields job.

Specified by:
run in interface Tool
Throws:
Exception


Copyright © 2006 The Apache Software Foundation