org.apache.nutch.indexer.field
Class CustomFields

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.nutch.indexer.field.CustomFields
All Implemented Interfaces:
Configurable, Tool

public class CustomFields
extends Configured
implements Tool

Creates custom FieldWritable objects from a text file containing field information including field name, value, and optional boost and fields type (as needed by FieldWritable objects). An input text file to CustomFields would be tab separated and would look similar to this:

 
 http://www.apache.org\tlang\ten\t5.0\tCONTENT
 http://lucene.apache.org\tlang\tde
 
The only required fields are url, name and value. Custom fields are configured through the custom-fields.xml file in the classpath. The config file allow you to set defaults for whether a field is indexed, stored, and tokenized, boosts on a field, and whether a field can output multiple values under the same key. The purpose of the CustomFields job is to allow better integration with technologies such as Hadoop Streaming. Streaming jobs can be created in any programming language, can output the text file needed by the CustomFields job, and those fields can then be included in the index. The concept of custom fields requires two separate pieces. The indexing piece and the query piece. The indexing piece is handled by the CustomFields job. The query piece is handled by the query-custom plugin. Important:
Currently, because of the way the query plugin architecture works, custom fields names must be added to the fields parameter in the query-custom plugin plugin.xml file in order to be queried. The CustomFields tool accepts one or more directories containing text files in the appropriate custom field format. These files are then turned into FieldWritable objects to be included in the index.


Nested Class Summary
static class CustomFields.Collector
          Aggregates FieldWritable objects by the same name for the same URL.
static class CustomFields.Converter
          Converts text values into FieldWritable objects.
 
Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
CustomFields()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
          Runs the CustomFields job.
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

CustomFields

public CustomFields()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Runs the CustomFields job.

Specified by:
run in interface Tool
Throws:
Exception


Copyright © 2006 The Apache Software Foundation