org.apache.crunch.io
Class From

java.lang.Object
  extended by org.apache.crunch.io.From

public class From
extends Object

Static factory methods for creating common Source types.

The From class is intended to provide a literate API for creating Crunch pipelines from common input file types. Pipeline pipeline = new MRPipeline(this.getClass()); // Reference the lines of a text file by wrapping the TextInputFormat class. PCollection lines = pipeline.read(From.textFile("/path/to/myfiles")); // Reference entries from a sequence file where the key is a LongWritable and the // value is a custom Writable class. PTable table = pipeline.read(From.sequenceFile( "/path/to/seqfiles", LongWritable.class, MyWritable.class)); // Reference the records from an Avro file, where MyAvroObject implements Avro's // SpecificRecord interface. PCollection myObjects = pipeline.read(From.avroFile("/path/to/avrofiles", MyAvroObject.class)); // References the key-value pairs from a custom extension of FileInputFormat: PTable custom = pipeline.read(From.formattedFile( "/custom", MyFileInputFormat.class, KeyWritable.class, ValueWritable.class));


Constructor Summary
From()
           
 
Method Summary
static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths)
          Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given paths.
static
<T extends org.apache.avro.specific.SpecificRecord>
Source<T>
avroFile(List<org.apache.hadoop.fs.Path> paths, Class<T> avroClass)
          Creates a Source<T> instance from the Avro file(s) at the given Paths.
static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths, org.apache.hadoop.conf.Configuration conf)
          Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given paths using the FileSystem information contained in the given Configuration instance.
static
<T> Source<T>
avroFile(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype)
          Creates a Source<T> instance from the Avro file(s) at the given Paths.
static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
          Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path.
static
<T extends org.apache.avro.specific.SpecificRecord>
Source<T>
avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
          Creates a Source<T> instance from the Avro file(s) at the given Path.
static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
          Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance.
static
<T> Source<T>
avroFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a Source<T> instance from the Avro file(s) at the given Path.
static Source<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
          Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path.
static
<T extends org.apache.avro.specific.SpecificRecord>
Source<T>
avroFile(String pathName, Class<T> avroClass)
          Creates a Source<T> instance from the Avro file(s) at the given path name.
static
<T> Source<T>
avroFile(String pathName, PType<T> ptype)
          Creates a Source<T> instance from the Avro file(s) at the given path name.
static
<K,V> TableSource<K,V>
formattedFile(List<org.apache.hadoop.fs.Path> paths, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
formattedFile(List<org.apache.hadoop.fs.Path> paths, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
static
<K,V> TableSource<K,V>
formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
formattedFile(org.apache.hadoop.fs.Path path, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
static
<K,V> TableSource<K,V>
formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
formattedFile(String pathName, Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
sequenceFile(List<org.apache.hadoop.fs.Path> paths, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
static
<T extends org.apache.hadoop.io.Writable>
Source<T>
sequenceFile(List<org.apache.hadoop.fs.Path> paths, Class<T> valueClass)
          Creates a Source<T> instance from the SequenceFile(s) at the given Paths from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSource<K,V>
sequenceFile(List<org.apache.hadoop.fs.Path> paths, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths.
static
<T> Source<T>
sequenceFile(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype)
          Creates a Source<T> instance from the SequenceFile(s) at the given Paths from the value field of each key-value pair in the SequenceFile(s).
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.
static
<T extends org.apache.hadoop.io.Writable>
Source<T>
sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
          Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSource<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.
static
<T> Source<T>
sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSource<K,V>
sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.
static
<T extends org.apache.hadoop.io.Writable>
Source<T>
sequenceFile(String pathName, Class<T> valueClass)
          Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSource<K,V>
sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
          Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.
static
<T> Source<T>
sequenceFile(String pathName, PType<T> ptype)
          Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static Source<String> textFile(List<org.apache.hadoop.fs.Path> paths)
          Creates a Source<String> instance for the text file(s) at the given Paths.
static
<T> Source<T>
textFile(List<org.apache.hadoop.fs.Path> paths, PType<T> ptype)
          Creates a Source<T> instance for the text file(s) at the given Paths using the provided PType<T> to convert the input text.
static Source<String> textFile(org.apache.hadoop.fs.Path path)
          Creates a Source<String> instance for the text file(s) at the given Path.
static
<T> Source<T>
textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a Source<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.
static Source<String> textFile(String pathName)
          Creates a Source<String> instance for the text file(s) at the given path name.
static
<T> Source<T>
textFile(String pathName, PType<T> ptype)
          Creates a Source<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

From

public From()
Method Detail

formattedFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(String pathName,
                                                                                                                               Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
                                                                                                                               Class<K> keyClass,
                                                                                                                               Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.

Parameters:
pathName - The name of the path to the data on the filesystem
formatClass - The FileInputFormat implementation
keyClass - The Writable to use for the key
valueClass - The Writable to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path,
                                                                                                                               Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
                                                                                                                               Class<K> keyClass,
                                                                                                                               Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.

Parameters:
path - The Path to the data
formatClass - The FileInputFormat implementation
keyClass - The Writable to use for the key
valueClass - The Writable to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> formattedFile(List<org.apache.hadoop.fs.Path> paths,
                                                                                                                               Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>> formatClass,
                                                                                                                               Class<K> keyClass,
                                                                                                                               Class<V> valueClass)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat<K, V> implementations not covered by the provided TableSource and Source factory methods.

Parameters:
paths - A list of Paths to the data
formatClass - The FileInputFormat implementation
keyClass - The Writable to use for the key
valueClass - The Writable to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K,V> TableSource<K,V> formattedFile(String pathName,
                                                   Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
                                                   PType<K> keyType,
                                                   PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.

Parameters:
pathName - The name of the path to the data on the filesystem
formatClass - The FileInputFormat implementation
keyType - The PType to use for the key
valueType - The PType to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K,V> TableSource<K,V> formattedFile(org.apache.hadoop.fs.Path path,
                                                   Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
                                                   PType<K> keyType,
                                                   PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.

Parameters:
path - The Path to the data
formatClass - The FileInputFormat implementation
keyType - The PType to use for the key
valueType - The PType to use for the value
Returns:
A new TableSource<K, V> instance

formattedFile

public static <K,V> TableSource<K,V> formattedFile(List<org.apache.hadoop.fs.Path> paths,
                                                   Class<? extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<?,?>> formatClass,
                                                   PType<K> keyType,
                                                   PType<V> valueType)
Creates a TableSource<K, V> for reading data from files that have custom FileInputFormat implementations not covered by the provided TableSource and Source factory methods.

Parameters:
paths - A list of Paths to the data
formatClass - The FileInputFormat implementation
keyType - The PType to use for the key
valueType - The PType to use for the value
Returns:
A new TableSource<K, V> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(String pathName,
                                                                                     Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new Source<T> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(org.apache.hadoop.fs.Path path,
                                                                                     Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new Source<T> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> Source<T> avroFile(List<org.apache.hadoop.fs.Path> paths,
                                                                                     Class<T> avroClass)
Creates a Source<T> instance from the Avro file(s) at the given Paths.

Parameters:
paths - A list of Paths to the data
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new Source<T> instance

avroFile

public static <T> Source<T> avroFile(String pathName,
                                     PType<T> ptype)
Creates a Source<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The AvroType for the Avro records
Returns:
A new Source<T> instance

avroFile

public static <T> Source<T> avroFile(org.apache.hadoop.fs.Path path,
                                     PType<T> ptype)
Creates a Source<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
ptype - The AvroType for the Avro records
Returns:
A new Source<T> instance

avroFile

public static <T> Source<T> avroFile(List<org.apache.hadoop.fs.Path> paths,
                                     PType<T> ptype)
Creates a Source<T> instance from the Avro file(s) at the given Paths.

Parameters:
paths - A list of Paths to the data
ptype - The PType for the Avro records
Returns:
A new Source<T> instance

avroFile

public static Source<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new Source<GenericData.Record> instance

avroFile

public static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
path - The path to the data on the filesystem
Returns:
A new Source<GenericData.Record> instance

avroFile

public static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths)
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given paths. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
paths - A list of paths to the data on the filesystem
Returns:
A new Source<GenericData.Record> instance

avroFile

public static Source<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path,
                                                                          org.apache.hadoop.conf.Configuration conf)
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
path - The path to the data on the filesystem
conf - The configuration information
Returns:
A new Source<GenericData.Record> instance

avroFile

public static Source<org.apache.avro.generic.GenericData.Record> avroFile(List<org.apache.hadoop.fs.Path> paths,
                                                                          org.apache.hadoop.conf.Configuration conf)
Creates a Source<GenericData.Record> by reading the schema of the Avro file at the given paths using the FileSystem information contained in the given Configuration instance. If the first path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
paths - The path to the data on the filesystem
conf - The configuration information
Returns:
A new Source<GenericData.Record> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(String pathName,
                                                                               Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                               Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> Source<T> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
                                                                               Class<T> valueClass)
Creates a Source<T> instance from the SequenceFile(s) at the given Paths from the value field of each key-value pair in the SequenceFile(s).

Parameters:
paths - A list of Paths to the data
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T> Source<T> sequenceFile(String pathName,
                                         PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T> Source<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                         PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
ptype - The PType for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <T> Source<T> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
                                         PType<T> ptype)
Creates a Source<T> instance from the SequenceFile(s) at the given Paths from the value field of each key-value pair in the SequenceFile(s).

Parameters:
paths - A list of Paths to the data
ptype - The PType for the value of the SequenceFile entry
Returns:
A new Source<T> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(String pathName,
                                                                                                                              Class<K> keyClass,
                                                                                                                              Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
keyClass - The Writable subclass for the key of the SequenceFile entry
valueClass - The Writable subclass for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                                                                              Class<K> keyClass,
                                                                                                                              Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.

Parameters:
path - The Path to the data
keyClass - The Writable subclass for the key of the SequenceFile entry
valueClass - The Writable subclass for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSource<K,V> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
                                                                                                                              Class<K> keyClass,
                                                                                                                              Class<V> valueClass)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths.

Parameters:
paths - A list of Paths to the data
keyClass - The Writable subclass for the key of the SequenceFile entry
valueClass - The Writable subclass for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K,V> TableSource<K,V> sequenceFile(String pathName,
                                                  PType<K> keyType,
                                                  PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K,V> TableSource<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                  PType<K> keyType,
                                                  PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Path.

Parameters:
path - The Path to the data
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

sequenceFile

public static <K,V> TableSource<K,V> sequenceFile(List<org.apache.hadoop.fs.Path> paths,
                                                  PType<K> keyType,
                                                  PType<V> valueType)
Creates a TableSource<K, V> instance for the SequenceFile(s) at the given Paths.

Parameters:
paths - A list of Paths to the data
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new SourceTable<K, V> instance

textFile

public static Source<String> textFile(String pathName)
Creates a Source<String> instance for the text file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new Source<String> instance

textFile

public static Source<String> textFile(org.apache.hadoop.fs.Path path)
Creates a Source<String> instance for the text file(s) at the given Path.

Parameters:
path - The Path to the data
Returns:
A new Source<String> instance

textFile

public static Source<String> textFile(List<org.apache.hadoop.fs.Path> paths)
Creates a Source<String> instance for the text file(s) at the given Paths.

Parameters:
paths - A list of Paths to the data
Returns:
A new Source<String> instance

textFile

public static <T> Source<T> textFile(String pathName,
                                     PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType<T> to use to process the input text
Returns:
A new Source<T> instance

textFile

public static <T> Source<T> textFile(org.apache.hadoop.fs.Path path,
                                     PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.

Parameters:
path - The Path to the data
ptype - The PType<T> to use to process the input text
Returns:
A new Source<T> instance

textFile

public static <T> Source<T> textFile(List<org.apache.hadoop.fs.Path> paths,
                                     PType<T> ptype)
Creates a Source<T> instance for the text file(s) at the given Paths using the provided PType<T> to convert the input text.

Parameters:
paths - A list of Paths to the data
ptype - The PType<T> to use to process the input text
Returns:
A new Source<T> instance


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.