ContentAsTextInputFormat (Nutch 1.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.segment
Class ContentAsTextInputFormat

java.lang.Object
  org.apache.hadoop.mapred.FileInputFormat<K,V>
      org.apache.hadoop.mapred.SequenceFileInputFormat<Text,Text>
          org.apache.nutch.segment.ContentAsTextInputFormat

All Implemented Interfaces:: InputFormat<Text,Text>

public class ContentAsTextInputFormat
extends SequenceFileInputFormat<Text,Text>
extends SequenceFileInputFormat<Text,Text>

An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
`LOG`

Constructor Summary
`ContentAsTextInputFormat()`

Method Summary
`RecordReader<Text,Text>`	`getRecordReader(InputSplit split, JobConf job, Reporter reporter)`

Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat
`listStatus`

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
`addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

ContentAsTextInputFormat

public ContentAsTextInputFormat()

Method Detail

getRecordReader

public RecordReader<Text,Text> getRecordReader(InputSplit split,
                                               JobConf job,
                                               Reporter reporter)
                                        throws IOException

Specified by:: getRecordReader in interface InputFormat<Text,Text>
Overrides:: getRecordReader in class SequenceFileInputFormat<Text,Text>

Throws:: IOException