org.apache.crunch.lib
Class Aggregate.TopKCombineFn<K,V>
java.lang.Object
org.apache.crunch.DoFn<Pair<S,Iterable<T>>,Pair<S,T>>
org.apache.crunch.CombineFn<Integer,Pair<K,V>>
org.apache.crunch.lib.Aggregate.TopKCombineFn<K,V>
- All Implemented Interfaces:
- Serializable
- Enclosing class:
- Aggregate
public static class Aggregate.TopKCombineFn<K,V>
- extends CombineFn<Integer,Pair<K,V>>
- See Also:
- Serialized Form
Aggregate.TopKCombineFn
public Aggregate.TopKCombineFn(int limit,
boolean maximize,
PType<Pair<K,V>> pairType)
initialize
public void initialize()
- Description copied from class:
DoFn
- Initialize this DoFn. This initialization will happen before the actual
DoFn.process(Object, Emitter)
is triggered. Subclasses may override
this method to do appropriate initialization.
Called during the setup of the job instance this DoFn
is associated
with.
- Overrides:
initialize
in class DoFn<Pair<Integer,Iterable<Pair<K,V>>>,Pair<Integer,Pair<K,V>>>
process
public void process(Pair<Integer,Iterable<Pair<K,V>>> input,
Emitter<Pair<Integer,Pair<K,V>>> emitter)
- Description copied from class:
DoFn
- Processes the records from a
PCollection
.
Note: Crunch can reuse a single input record object whose content
changes on each DoFn.process(Object, Emitter)
method call. This
functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value
objects that are passed into the reduce, therefore the application should
clone the objects they want to keep a copy of.
- Specified by:
process
in class DoFn<Pair<Integer,Iterable<Pair<K,V>>>,Pair<Integer,Pair<K,V>>>
- Parameters:
input
- The input record.emitter
- The emitter to send the output to
Copyright © 2014 The Apache Software Foundation. All Rights Reserved.