org.apache.nutch.indexer.field
Class AnchorFields
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.indexer.field.AnchorFields
- All Implemented Interfaces:
- Configurable, Tool
public class AnchorFields
- extends Configured
- implements Tool
Creates FieldWritable objects for inbound anchor text. These FieldWritable
objects are then included in the input to the FieldIndexer to be converted
to Lucene Field objects and indexed.
Any empty or null anchor text is ignored. Anchors are sorted in descending
order according to the score of their parent pages. There are settings for a
maximum number of anchors to index and whether those anchors should be stored
and tokenized. With a descending order by score and a maximum anchors index
we ensure that only the best anchors are indexed assuming that a higher link
analysis score equals a better page and better inbound text.
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
AnchorFields
public AnchorFields()
createFields
public void createFields(Path webGraphDb,
Path basicFields,
Path output)
throws IOException
- Creates the FieldsWritable object from the anchors.
- Parameters:
webGraphDb
- The WebGraph from which to pull outlinks.basicFields
- The BasicFields that must be present to avoid orphan
anchor fields.output
- The AnchorFields output.
- Throws:
IOException
- If an error occurs while creating the fields.
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Runs the AnchorFields job.
- Specified by:
run
in interface Tool
- Throws:
Exception
Copyright © 2006 The Apache Software Foundation