org.apache.crunch.lib
Class Distinct

java.lang.Object
  extended by org.apache.crunch.lib.Distinct

public final class Distinct
extends Object

Functions for computing the distinct elements of a PCollection.


Method Summary
static
<S> PCollection<S>
distinct(PCollection<S> input)
          Construct a new PCollection that contains the unique elements of a given input PCollection.
static
<S> PCollection<S>
distinct(PCollection<S> input, int flushEvery)
          A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.
static
<K,V> PTable<K,V>
distinct(PTable<K,V> input)
          A PTable<K, V> analogue of the distinct function.
static
<K,V> PTable<K,V>
distinct(PTable<K,V> input, int flushEvery)
          A PTable<K, V> analogue of the distinct function.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

distinct

public static <S> PCollection<S> distinct(PCollection<S> input)
Construct a new PCollection that contains the unique elements of a given input PCollection.

Parameters:
input - The input PCollection
Returns:
A new PCollection that contains the unique elements of the input

distinct

public static <K,V> PTable<K,V> distinct(PTable<K,V> input)
A PTable<K, V> analogue of the distinct function.


distinct

public static <S> PCollection<S> distinct(PCollection<S> input,
                                          int flushEvery)
A distinct operation that gives the client more control over how frequently elements are flushed to disk in order to allow control over performance or memory consumption.

Parameters:
input - The input PCollection
flushEvery - Flush the elements to disk whenever we encounter this many unique values
Returns:
A new PCollection that contains the unique elements of the input

distinct

public static <K,V> PTable<K,V> distinct(PTable<K,V> input,
                                         int flushEvery)
A PTable<K, V> analogue of the distinct function.



Copyright © 2014 The Apache Software Foundation. All Rights Reserved.