org.apache.nutch.indexer.field
Class BasicFields

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.BasicFields
All Implemented Interfaces:
Configurable, Tool

public class BasicFields
extends Configured
implements Tool

Creates the basic FieldWritable objects. The basic fields are the main fields used in indexing segments. Many other fields jobs will rely on the urls being present in the basic fields output to create their fields for indexing. Basic fields are extracted from segements. Only urls that were successfully fetched and parsed will be converted. This job also implements a portion of redirect logic. If a url contains both a redirect or orig url then both the url and its orig will be measured against their link analysis score with the highest scoring one being the url used for display in the index. This ensures that we index content under the best, most popular, url which is most often the one users are expecting. The BasicFields tool can accept one or more segments to convert to fields. If multiple segments have overlapping content, only the latest successfully fetched content will be converted.


Nested Class Summary
static class BasicFields.Flipper
          Runs the first part of redirect logic.
static class BasicFields.Merger
          Merges output of all segments fields collecting only the most recent set of fields for any given url.
static class BasicFields.Scorer
          The Scorer job sets the boost field from the NodeDb score.
 
Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
BasicFields()
           
 
Method Summary
 void createFields(Path nodeDb, Path[] segments, Path output)
          Runs the BasicFields jobs for every segment and aggregates and filters the output to create a final database of FieldWritable objects.
static void main(String[] args)
           
 int run(String[] args)
          Runs the BasicFields tool.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

BasicFields

public BasicFields()
Method Detail

createFields

public void createFields(Path nodeDb,
                         Path[] segments,
                         Path output)
                  throws IOException
Runs the BasicFields jobs for every segment and aggregates and filters the output to create a final database of FieldWritable objects.

Parameters:
nodeDb - The node database.
segments - The array of segments to process.
output - The BasicFields output.
Throws:
IOException - If an error occurs while processing the segments.

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Runs the BasicFields tool.

Specified by:
run in interface Tool
Throws:
Exception


Copyright © 2006 The Apache Software Foundation