org.apache.crunch.io.text
Class NLineFileSource<T>
java.lang.Object
org.apache.crunch.io.impl.FileSourceImpl<T>
org.apache.crunch.io.text.NLineFileSource<T>
- All Implemented Interfaces:
- ReadableSource<T>, Source<T>
public class NLineFileSource<T>
- extends FileSourceImpl<T>
- implements ReadableSource<T>
A Source
instance that uses the NLineInputFormat
, which gives each map
task a fraction of the lines in a text file as input. Most useful when running simulations
on Hadoop, where each line represents configuration information about each simulation
run.
Constructor Summary |
NLineFileSource(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype,
int linesPerTask)
Create a new NLineFileSource instance. |
NLineFileSource(org.apache.hadoop.fs.Path path,
PType<T> ptype,
int linesPerTask)
Create a new NLineFileSource instance. |
NLineFileSource(String path,
PType<T> ptype,
int linesPerTask)
Create a new NLineFileSource instance. |
Methods inherited from class org.apache.crunch.io.impl.FileSourceImpl |
configureSource, equals, getBundle, getConverter, getLastModifiedAt, getPath, getPaths, getSize, getType, hashCode, inputConf, pathsAsString, read |
NLineFileSource
public NLineFileSource(String path,
PType<T> ptype,
int linesPerTask)
- Create a new
NLineFileSource
instance.
- Parameters:
path
- The path to the input data, as a Stringptype
- The PType to use for processing the datalinesPerTask
- The number of lines from the input each map task will process
NLineFileSource
public NLineFileSource(org.apache.hadoop.fs.Path path,
PType<T> ptype,
int linesPerTask)
- Create a new
NLineFileSource
instance.
- Parameters:
path
- The Path
to the input dataptype
- The PType to use for processing the datalinesPerTask
- The number of lines from the input each map task will process
NLineFileSource
public NLineFileSource(List<org.apache.hadoop.fs.Path> paths,
PType<T> ptype,
int linesPerTask)
- Create a new
NLineFileSource
instance.
- Parameters:
paths
- The Path
s to the input dataptype
- The PType to use for processing the datalinesPerTask
- The number of lines from the input each map task will process
toString
public String toString()
- Overrides:
toString
in class FileSourceImpl<T>
read
public Iterable<T> read(org.apache.hadoop.conf.Configuration conf)
throws IOException
- Description copied from interface:
ReadableSource
- Returns an
Iterable
that contains the contents of this source.
- Specified by:
read
in interface ReadableSource<T>
- Parameters:
conf
- The current Configuration
instance
- Returns:
- the contents of this
Source
as an Iterable
instance
- Throws:
IOException
asReadable
public ReadableData<T> asReadable()
- Specified by:
asReadable
in interface ReadableSource<T>
- Returns:
- a
ReadableData
instance containing the data referenced by this
ReadableSource
.
Copyright © 2014 The Apache Software Foundation. All Rights Reserved.