org.apache.crunch.lib
Class Shard

java.lang.Object
  extended by org.apache.crunch.lib.Shard

public class Shard
extends Object

Utilities for controlling how the data in a PCollection is balanced across reducers and output files.


Constructor Summary
Shard()
           
 
Method Summary
static
<T> PCollection<T>
shard(PCollection<T> pc, int numPartitions)
          Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Shard

public Shard()
Method Detail

shard

public static <T> PCollection<T> shard(PCollection<T> pc,
                                       int numPartitions)
Creates a PCollection<T> that has the same contents as its input argument but will be written to a fixed number of output files. This is useful for map-only jobs that process lots of input files but only write out a small amount of input per task.

Parameters:
pc - The PCollection<T> to rebalance
numPartitions - The number of output partitions to create
Returns:
A rebalanced PCollection<T> with the same contents as the input


Copyright © 2014 The Apache Software Foundation. All Rights Reserved.