org.apache.nutch.indexer.field
Class CustomFields
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.nutch.indexer.field.CustomFields
- All Implemented Interfaces:
- Configurable, Tool
public class CustomFields
- extends Configured
- implements Tool
Creates custom FieldWritable objects from a text file containing field
information including field name, value, and optional boost and fields type
(as needed by FieldWritable objects).
An input text file to CustomFields would be tab separated and would look
similar to this:
http://www.apache.org\tlang\ten\t5.0\tCONTENT
http://lucene.apache.org\tlang\tde
The only required fields are url, name and value. Custom fields are
configured through the custom-fields.xml file in the classpath. The config
file allow you to set defaults for whether a field is indexed, stored, and
tokenized, boosts on a field, and whether a field can output multiple values
under the same key.
The purpose of the CustomFields job is to allow better integration with
technologies such as Hadoop Streaming. Streaming jobs can be created in any
programming language, can output the text file needed by the CustomFields
job, and those fields can then be included in the index.
The concept of custom fields requires two separate pieces. The indexing piece
and the query piece. The indexing piece is handled by the CustomFields job.
The query piece is handled by the query-custom plugin.
Important:
Currently, because of the way the query plugin
architecture works, custom fields names must be added to the fields parameter
in the query-custom plugin plugin.xml file in order to be queried.
The CustomFields tool accepts one or more directories containing text files
in the appropriate custom field format. These files are then turned into
FieldWritable objects to be included in the index.
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
Method Summary |
static void |
main(String[] args)
|
int |
run(String[] args)
Runs the CustomFields job. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
CustomFields
public CustomFields()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Runs the CustomFields job.
- Specified by:
run
in interface Tool
- Throws:
Exception
Copyright © 2006 The Apache Software Foundation