org.apache.gora.mapreduce
Class GoraInputFormat<K,T extends Persistent>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,T>
      extended by org.apache.gora.mapreduce.GoraInputFormat<K,T>
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

public class GoraInputFormat<K,T extends Persistent>
extends org.apache.hadoop.mapreduce.InputFormat<K,T>
implements org.apache.hadoop.conf.Configurable

InputFormat to fetch the input from Gora data stores. The query to fetch the items from the datastore should be prepared and set via setQuery(Job, Query), before submitting the job.

The InputSplits are prepared from the PartitionQuerys obtained by calling DataStore.getPartitions(Query).

Hadoop jobs can be either configured through static setInput() methods, or from GoraMapper.

See Also:
GoraMapper

Field Summary
static String QUERY_KEY
           
 
Constructor Summary
GoraInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<K,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 org.apache.hadoop.conf.Configuration getConf()
           
 Query<K,T> getQuery(org.apache.hadoop.conf.Configuration conf)
           
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
           
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
static
<K1,V1 extends Persistent>
void
setInput(org.apache.hadoop.mapreduce.Job job, Class<? extends DataStore<K1,V1>> dataStoreClass, Class<K1> inKeyClass, Class<V1> inValueClass, boolean reuseObjects)
          Sets the input parameters for the job
static
<K1,V1 extends Persistent>
void
setInput(org.apache.hadoop.mapreduce.Job job, Query<K1,V1> query, boolean reuseObjects)
          Sets the input parameters for the job
static
<K1,V1 extends Persistent>
void
setInput(org.apache.hadoop.mapreduce.Job job, Query<K1,V1> query, DataStore<K1,V1> dataStore, boolean reuseObjects)
          Sets the input parameters for the job
static
<K,T extends Persistent>
void
setQuery(org.apache.hadoop.mapreduce.Job job, Query<K,T> query)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

QUERY_KEY

public static final String QUERY_KEY
See Also:
Constant Field Values
Constructor Detail

GoraInputFormat

public GoraInputFormat()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<K,T> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                        org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                    throws IOException,
                                                                                           InterruptedException
Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<K,T extends Persistent>
Throws:
IOException
InterruptedException

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException,
                                                              InterruptedException
Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<K,T extends Persistent>
Throws:
IOException
InterruptedException

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

setQuery

public static <K,T extends Persistent> void setQuery(org.apache.hadoop.mapreduce.Job job,
                                                     Query<K,T> query)
                     throws IOException
Throws:
IOException

getQuery

public Query<K,T> getQuery(org.apache.hadoop.conf.Configuration conf)
                                       throws IOException
Throws:
IOException

setInput

public static <K1,V1 extends Persistent> void setInput(org.apache.hadoop.mapreduce.Job job,
                                                       Query<K1,V1> query,
                                                       boolean reuseObjects)
                     throws IOException
Sets the input parameters for the job

Parameters:
job - the job to set the properties for
query - the query to get the inputs from
reuseObjects - whether to reuse objects in serialization
Throws:
IOException

setInput

public static <K1,V1 extends Persistent> void setInput(org.apache.hadoop.mapreduce.Job job,
                                                       Query<K1,V1> query,
                                                       DataStore<K1,V1> dataStore,
                                                       boolean reuseObjects)
                     throws IOException
Sets the input parameters for the job

Parameters:
job - the job to set the properties for
query - the query to get the inputs from
dataStore - the datastore as the input
reuseObjects - whether to reuse objects in serialization
Throws:
IOException

setInput

public static <K1,V1 extends Persistent> void setInput(org.apache.hadoop.mapreduce.Job job,
                                                       Class<? extends DataStore<K1,V1>> dataStoreClass,
                                                       Class<K1> inKeyClass,
                                                       Class<V1> inValueClass,
                                                       boolean reuseObjects)
                     throws IOException
Sets the input parameters for the job

Parameters:
job - the job to set the properties for
dataStoreClass - the datastore class
inKeyClass - Map input key class
inValueClass - Map input value class
reuseObjects - whether to reuse objects in serialization
Throws:
IOException


Copyright © 2010-2013 The Apache Software Foundation. All Rights Reserved.