org.apache.crunch.lib
Class Shard
java.lang.Object
org.apache.crunch.lib.Shard
public class Shard
- extends Object
Utilities for controlling how the data in a PCollection
is balanced across reducers
and output files.
Constructor Summary |
Shard()
|
Method Summary |
static
|
shard(PCollection<T> pc,
int numPartitions)
Creates a PCollection<T> that has the same contents as its input argument but will
be written to a fixed number of output files. |
Shard
public Shard()
shard
public static <T> PCollection<T> shard(PCollection<T> pc,
int numPartitions)
- Creates a
PCollection<T>
that has the same contents as its input argument but will
be written to a fixed number of output files. This is useful for map-only jobs that process
lots of input files but only write out a small amount of input per task.
- Parameters:
pc
- The PCollection<T>
to rebalancenumPartitions
- The number of output partitions to create
- Returns:
- A rebalanced
PCollection<T>
with the same contents as the input
Copyright © 2014 The Apache Software Foundation. All Rights Reserved.