org.apache.nutch.segment
Class ContentAsTextInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<K,V>
org.apache.hadoop.mapred.SequenceFileInputFormat<Text,Text>
org.apache.nutch.segment.ContentAsTextInputFormat
- All Implemented Interfaces:
- InputFormat<Text,Text>
public class ContentAsTextInputFormat
- extends SequenceFileInputFormat<Text,Text>
An input format that takes Nutch Content objects and converts them to text
while converting newline endings to spaces. This format is useful for working
with Nutch content objects in Hadoop Streaming with other languages.
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ContentAsTextInputFormat
public ContentAsTextInputFormat()
getRecordReader
public RecordReader<Text,Text> getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
throws IOException
- Specified by:
getRecordReader
in interface InputFormat<Text,Text>
- Overrides:
getRecordReader
in class SequenceFileInputFormat<Text,Text>
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation