AvroPathPerKeyOutputFormat (Apache Crunch 0.8.3 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.crunch.types.avro
Class AvroPathPerKeyOutputFormat<T>

java.lang.Object
  org.apache.hadoop.mapreduce.OutputFormat<K,V>
      org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>
          org.apache.crunch.types.avro.AvroPathPerKeyOutputFormat<T>

public class AvroPathPerKeyOutputFormat<T>
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>

A FileOutputFormat that takes in a Utf8 and an Avro record and writes the Avro records to a sub-directory of the output path whose name is equal to the string-form of the Utf8. This OutputFormat only keeps one RecordWriter open at a time, so it's a very good idea to write out all of the records for the same key at the same time within each partition so as not to be frequently opening and closing files.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
`org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter`

Constructor Summary
`AvroPathPerKeyOutputFormat()`

Method Summary
`org.apache.hadoop.mapreduce.RecordWriter<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>`	`getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)`

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
`checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputPath`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

AvroPathPerKeyOutputFormat

public AvroPathPerKeyOutputFormat()

Method Detail

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                                                                                                                                         throws IOException,
                                                                                                                                                                                                InterruptedException

Specified by:: getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.avro.mapred.AvroWrapper<org.apache.avro.mapred.Pair<org.apache.avro.util.Utf8,T>>,org.apache.hadoop.io.NullWritable>

Throws:: IOException; InterruptedException