org.apache.nutch.indexer.field
Class BasicFields
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.indexer.field.BasicFields
- All Implemented Interfaces:
- Configurable, Tool
public class BasicFields
- extends Configured
- implements Tool
Creates the basic FieldWritable objects. The basic fields are the main
fields used in indexing segments. Many other fields jobs will rely on the
urls being present in the basic fields output to create their fields for
indexing.
Basic fields are extracted from segements. Only urls that were successfully
fetched and parsed will be converted. This job also implements a portion of
redirect logic. If a url contains both a redirect or orig url then both the
url and its orig will be measured against their link analysis score with the
highest scoring one being the url used for display in the index. This
ensures that we index content under the best, most popular, url which is most
often the one users are expecting.
The BasicFields tool can accept one or more segments to convert to fields.
If multiple segments have overlapping content, only the latest successfully
fetched content will be converted.
Nested Class Summary |
static class |
BasicFields.Flipper
Runs the first part of redirect logic. |
static class |
BasicFields.Merger
Merges output of all segments fields collecting only the most recent set
of fields for any given url. |
static class |
BasicFields.Scorer
The Scorer job sets the boost field from the NodeDb score. |
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
Method Summary |
void |
createFields(Path nodeDb,
Path[] segments,
Path output)
Runs the BasicFields jobs for every segment and aggregates and filters
the output to create a final database of FieldWritable objects. |
static void |
main(String[] args)
|
int |
run(String[] args)
Runs the BasicFields tool. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
BasicFields
public BasicFields()
createFields
public void createFields(Path nodeDb,
Path[] segments,
Path output)
throws IOException
- Runs the BasicFields jobs for every segment and aggregates and filters
the output to create a final database of FieldWritable objects.
- Parameters:
nodeDb
- The node database.segments
- The array of segments to process.output
- The BasicFields output.
- Throws:
IOException
- If an error occurs while processing the segments.
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Runs the BasicFields tool.
- Specified by:
run
in interface Tool
- Throws:
Exception
Copyright © 2006 The Apache Software Foundation