org.apache.nutch.segment
Class ContentAsTextInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<K,V>
      extended by org.apache.hadoop.mapred.SequenceFileInputFormat<Text,Text>
          extended by org.apache.nutch.segment.ContentAsTextInputFormat
All Implemented Interfaces:
InputFormat<Text,Text>

public class ContentAsTextInputFormat
extends SequenceFileInputFormat<Text,Text>

An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.


Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
ContentAsTextInputFormat()
           
 
Method Summary
 RecordReader<Text,Text> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat
listStatus
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ContentAsTextInputFormat

public ContentAsTextInputFormat()
Method Detail

getRecordReader

public RecordReader<Text,Text> getRecordReader(InputSplit split,
                                               JobConf job,
                                               Reporter reporter)
                                        throws IOException
Specified by:
getRecordReader in interface InputFormat<Text,Text>
Overrides:
getRecordReader in class SequenceFileInputFormat<Text,Text>
Throws:
IOException


Copyright © 2006 The Apache Software Foundation