org.apache.crunch.io
Class At

java.lang.Object
  extended by org.apache.crunch.io.At

public class At
extends Object

Static factory methods for creating common SourceTarget types, which may be treated as both a Source and a Target.

The At methods is analogous to the From and To factory methods, but is used for storing intermediate outputs that need to be passed from one run of a MapReduce pipeline to another run. The SourceTarget object acts as both a Source and a , which enables it to provide this functionality. Pipeline pipeline = new MRPipeline(this.getClass()); // Create our intermediate storage location SourceTarget intermediate = At.textFile("/temptext"); ... // Write out the output of the first phase of a pipeline. pipeline.write(phase1, intermediate); // Explicitly call run to kick off the pipeline. pipeline.run(); // And then kick off a second phase by consuming the output // from the first phase. PCollection phase2Input = pipeline.read(intermediate); ...

The SourceTarget abstraction is useful when we care about reading the intermediate outputs of a pipeline as well as the final results.


Constructor Summary
At()
           
 
Method Summary
static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
          Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path.
static
<T extends org.apache.avro.specific.SpecificRecord>
SourceTarget<T>
avroFile(org.apache.hadoop.fs.Path path, Class<T> avroClass)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.
static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
          Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance.
static
<T> SourceTarget<T>
avroFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.
static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
          Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path.
static
<T extends org.apache.avro.specific.SpecificRecord>
SourceTarget<T>
avroFile(String pathName, Class<T> avroClass)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.
static
<T> SourceTarget<T>
avroFile(String pathName, PType<T> ptype)
          Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSourceTarget<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).
static
<T extends org.apache.hadoop.io.Writable>
SourceTarget<T>
sequenceFile(org.apache.hadoop.fs.Path path, Class<T> valueClass)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSourceTarget<K,V>
sequenceFile(org.apache.hadoop.fs.Path path, PType<K> keyType, PType<V> valueType)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).
static
<T> SourceTarget<T>
sequenceFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).
static
<K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable>
TableSourceTarget<K,V>
sequenceFile(String pathName, Class<K> keyClass, Class<V> valueClass)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).
static
<T extends org.apache.hadoop.io.Writable>
SourceTarget<T>
sequenceFile(String pathName, Class<T> valueClass)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static
<K,V> TableSourceTarget<K,V>
sequenceFile(String pathName, PType<K> keyType, PType<V> valueType)
          Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).
static
<T> SourceTarget<T>
sequenceFile(String pathName, PType<T> ptype)
          Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).
static SourceTarget<String> textFile(org.apache.hadoop.fs.Path path)
          Creates a SourceTarget<String> instance for the text file(s) at the given Path.
static
<T> SourceTarget<T>
textFile(org.apache.hadoop.fs.Path path, PType<T> ptype)
          Creates a SourceTarget<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.
static SourceTarget<String> textFile(String pathName)
          Creates a SourceTarget<String> instance for the text file(s) at the given path name.
static
<T> SourceTarget<T>
textFile(String pathName, PType<T> ptype)
          Creates a SourceTarget<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

At

public At()
Method Detail

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> SourceTarget<T> avroFile(String pathName,
                                                                                           Class<T> avroClass)
Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new SourceTarget<T> instance

avroFile

public static <T extends org.apache.avro.specific.SpecificRecord> SourceTarget<T> avroFile(org.apache.hadoop.fs.Path path,
                                                                                           Class<T> avroClass)
Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
avroClass - The subclass of SpecificRecord to use for the Avro file
Returns:
A new SourceTarget<T> instance

avroFile

public static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(String pathName)
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new SourceTarget<GenericData.Record> instance

avroFile

public static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path)
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
path - The path to the data on the filesystem
Returns:
A new SourceTarget<GenericData.Record> instance

avroFile

public static SourceTarget<org.apache.avro.generic.GenericData.Record> avroFile(org.apache.hadoop.fs.Path path,
                                                                                org.apache.hadoop.conf.Configuration conf)
Creates a SourceTarget<GenericData.Record> by reading the schema of the Avro file at the given path using the FileSystem information contained in the given Configuration instance. If the path is a directory, the schema of a file in the directory will be used to determine the schema to use.

Parameters:
path - The path to the data on the filesystem
conf - The configuration information
Returns:
A new SourceTarget<GenericData.Record> instance

avroFile

public static <T> SourceTarget<T> avroFile(String pathName,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance from the Avro file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType for the Avro records
Returns:
A new SourceTarget<T> instance

avroFile

public static <T> SourceTarget<T> avroFile(org.apache.hadoop.fs.Path path,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance from the Avro file(s) at the given Path.

Parameters:
path - The Path to the data
ptype - The PType for the Avro records
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> SourceTarget<T> sequenceFile(String pathName,
                                                                                     Class<T> valueClass)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T extends org.apache.hadoop.io.Writable> SourceTarget<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                                     Class<T> valueClass)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T> SourceTarget<T> sequenceFile(String pathName,
                                               PType<T> ptype)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given path name from the value field of each key-value pair in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <T> SourceTarget<T> sequenceFile(org.apache.hadoop.fs.Path path,
                                               PType<T> ptype)
Creates a SourceTarget<T> instance from the SequenceFile(s) at the given Path from the value field of each key-value pair in the SequenceFile(s).

Parameters:
path - The Path to the data
ptype - The PType for the value of the SequenceFile entry
Returns:
A new SourceTarget<T> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSourceTarget<K,V> sequenceFile(String pathName,
                                                                                                                                    Class<K> keyClass,
                                                                                                                                    Class<V> valueClass)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
keyClass - The Writable type for the key of the SequenceFile entry
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

sequenceFile

public static <K extends org.apache.hadoop.io.Writable,V extends org.apache.hadoop.io.Writable> TableSourceTarget<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                                                                                                    Class<K> keyClass,
                                                                                                                                    Class<V> valueClass)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).

Parameters:
path - The Path to the data
keyClass - The Writable type for the key of the SequenceFile entry
valueClass - The Writable type for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

sequenceFile

public static <K,V> TableSourceTarget<K,V> sequenceFile(String pathName,
                                                        PType<K> keyType,
                                                        PType<V> valueType)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given path name from the key-value pairs in the SequenceFile(s).

Parameters:
pathName - The name of the path to the data on the filesystem
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

sequenceFile

public static <K,V> TableSourceTarget<K,V> sequenceFile(org.apache.hadoop.fs.Path path,
                                                        PType<K> keyType,
                                                        PType<V> valueType)
Creates a TableSourceTarget<K, V> instance from the SequenceFile(s) at the given Path from the key-value pairs in the SequenceFile(s).

Parameters:
path - The Path to the data
keyType - The PType for the key of the SequenceFile entry
valueType - The PType for the value of the SequenceFile entry
Returns:
A new TableSourceTarget<K, V> instance

textFile

public static SourceTarget<String> textFile(String pathName)
Creates a SourceTarget<String> instance for the text file(s) at the given path name.

Parameters:
pathName - The name of the path to the data on the filesystem
Returns:
A new SourceTarget<String> instance

textFile

public static SourceTarget<String> textFile(org.apache.hadoop.fs.Path path)
Creates a SourceTarget<String> instance for the text file(s) at the given Path.

Parameters:
path - The Path to the data
Returns:
A new SourceTarget<String> instance

textFile

public static <T> SourceTarget<T> textFile(String pathName,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance for the text file(s) at the given path name using the provided PType<T> to convert the input text.

Parameters:
pathName - The name of the path to the data on the filesystem
ptype - The PType<T> to use to process the input text
Returns:
A new SourceTarget<T> instance

textFile

public static <T> SourceTarget<T> textFile(org.apache.hadoop.fs.Path path,
                                           PType<T> ptype)
Creates a SourceTarget<T> instance for the text file(s) at the given Path using the provided PType<T> to convert the input text.

Parameters:
path - The Path to the data
ptype - The PType<T> to use to process the input text
Returns:
A new SourceTarget<T> instance


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.