public class MultiOutputFormat
extends org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Writable,org.apache.hadoop.io.Writable>
Multiple output formats can be defined each with its own
OutputFormat
class, own key class and own value class. Any
configuration on these output format classes can be done without interfering
with other output format's configuration.
Usage pattern for job submission:
Job job = new Job(); FileInputFormat.setInputPath(job, inDir); job.setMapperClass(WordCountMap.class); job.setReducerClass(WordCountReduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(MultiOutputFormat.class); // Need not define OutputKeyClass and OutputValueClass. They default to // Writable.class job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); // Create a JobConfigurer that will configure the job with the multiple // output format information. JobConfigurer configurer = MultiOutputFormat.createConfigurer(job); // Defines additional single text based output 'text' for the job. // Any configuration for the defined OutputFormat should be done with // the Job obtained with configurer.getJob() method. configurer.addOutputFormat("text", TextOutputFormat.class, IntWritable.class, Text.class); FileOutputFormat.setOutputPath(configurer.getJob("text"), textOutDir); // Defines additional sequence-file based output 'sequence' for the job configurer.addOutputFormat("sequence", SequenceFileOutputFormat.class, Text.class, IntWritable.class); FileOutputFormat.setOutputPath(configurer.getJob("sequence"), seqOutDir); ... // configure method to be called on the JobConfigurer once all the // output formats have been defined and configured. configurer.configure(); job.waitForCompletion(true); ...
Usage in Reducer:
public class WordCountReduce extends Reducer<Text, IntWritable, Writable, Writable> { private IntWritable count = new IntWritable(); public void reduce(Text word, Iterator<IntWritable> values, Context context) throws IOException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } count.set(sum); MultiOutputFormat.write("text", count, word, context); MultiOutputFormat.write("sequence", word, count, context); } }Map only jobs:
MultiOutputFormat.write("output", key, value, context); can be called similar to a reducer in map only jobs.
Modifier and Type | Class and Description |
---|---|
static class |
MultiOutputFormat.JobConfigurer
Class that supports configuration of the job for multiple output formats.
|
class |
MultiOutputFormat.MultiOutputCommitter |
Constructor and Description |
---|
MultiOutputFormat() |
Modifier and Type | Method and Description |
---|---|
void |
checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext context) |
static MultiOutputFormat.JobConfigurer |
createConfigurer(org.apache.hadoop.mapreduce.Job job)
Get a JobConfigurer instance that will support configuration of the job
for multiple output formats.
|
static org.apache.hadoop.mapreduce.JobContext |
getJobContext(String alias,
org.apache.hadoop.mapreduce.JobContext context)
Get the JobContext with the related OutputFormat configuration populated given the alias
and the actual JobContext
|
org.apache.hadoop.mapreduce.OutputCommitter |
getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context) |
org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.Writable,org.apache.hadoop.io.Writable> |
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) |
static org.apache.hadoop.mapreduce.TaskAttemptContext |
getTaskAttemptContext(String alias,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get the TaskAttemptContext with the related OutputFormat configuration populated given the alias
and the actual TaskAttemptContext
|
static <K,V> void |
write(String alias,
K key,
V value,
org.apache.hadoop.mapreduce.TaskInputOutputContext context)
Write the output key and value using the OutputFormat defined by the
alias.
|
public static MultiOutputFormat.JobConfigurer createConfigurer(org.apache.hadoop.mapreduce.Job job)
job
- the mapreduce job to be submittedpublic static org.apache.hadoop.mapreduce.JobContext getJobContext(String alias, org.apache.hadoop.mapreduce.JobContext context)
alias
- the name given to the OutputFormat configurationcontext
- the JobContextpublic static org.apache.hadoop.mapreduce.TaskAttemptContext getTaskAttemptContext(String alias, org.apache.hadoop.mapreduce.TaskAttemptContext context)
alias
- the name given to the OutputFormat configurationcontext
- the Mapper or Reducer Contextpublic static <K,V> void write(String alias, K key, V value, org.apache.hadoop.mapreduce.TaskInputOutputContext context) throws IOException, InterruptedException
alias
- the name given to the OutputFormat configurationkey
- the output key to be writtenvalue
- the output value to be writtencontext
- the Mapper or Reducer ContextIOException
InterruptedException
public void checkOutputSpecs(org.apache.hadoop.mapreduce.JobContext context) throws IOException, InterruptedException
checkOutputSpecs
in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Writable,org.apache.hadoop.io.Writable>
IOException
InterruptedException
public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.Writable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
getRecordWriter
in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Writable,org.apache.hadoop.io.Writable>
IOException
InterruptedException
public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
getOutputCommitter
in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Writable,org.apache.hadoop.io.Writable>
IOException
InterruptedException
Copyright © 2017 The Apache Software Foundation. All rights reserved.