org.apache.crunch.types.avro
Class AvroPathPerKeyOutputFormat<T>
java.lang.Object
org.apache.hadoop.mapreduce.OutputFormat<K,V>
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>
org.apache.crunch.types.avro.AvroPathPerKeyOutputFormat<T>
public class AvroPathPerKeyOutputFormat<T>
- extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>
A FileOutputFormat
that takes in a Utf8
and an Avro record and writes the Avro records to
a sub-directory of the output path whose name is equal to the string-form of the Utf8
.
This OutputFormat
only keeps one RecordWriter
open at a time, so it's a very good idea to write
out all of the records for the same key at the same time within each partition so as not to be frequently opening
and closing files.
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat |
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter |
Method Summary |
org.apache.hadoop.mapreduce.RecordWriter<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable> |
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
|
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat |
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath |
AvroPathPerKeyOutputFormat
public AvroPathPerKeyOutputFormat()
getRecordWriter
public org.apache.hadoop.mapreduce.RecordWriter<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
throws IOException,
InterruptedException
- Specified by:
getRecordWriter
in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>
- Throws:
IOException
InterruptedException
Copyright © 2014 The Apache Software Foundation. All Rights Reserved.