org.apache.crunch.contrib.bloomfilter
Class BloomFilterFn<S>
java.lang.Object
org.apache.crunch.DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
org.apache.crunch.contrib.bloomfilter.BloomFilterFn<S>
- All Implemented Interfaces:
- Serializable
public abstract class BloomFilterFn<S>
- extends DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
The class is responsible for generating keys that are used in a BloomFilter
- See Also:
- Serialized Form
CRUNCH_FILTER_SIZE
public static final String CRUNCH_FILTER_SIZE
- See Also:
- Constant Field Values
CRUNCH_FILTER_NAME
public static final String CRUNCH_FILTER_NAME
- See Also:
- Constant Field Values
BloomFilterFn
public BloomFilterFn()
initialize
public void initialize()
- Description copied from class:
DoFn
- Initialize this DoFn. This initialization will happen before the actual
DoFn.process(Object, Emitter)
is triggered. Subclasses may override
this method to do appropriate initialization.
Called during the setup of the job instance this DoFn
is associated
with.
- Overrides:
initialize
in class DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
process
public void process(S input,
Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter)
- Description copied from class:
DoFn
- Processes the records from a
PCollection
.
Note: Crunch can reuse a single input record object whose content
changes on each DoFn.process(Object, Emitter)
method call. This
functionality is imposed by Hadoop's Reducer implementation: The framework will reuse the key and value
objects that are passed into the reduce, therefore the application should
clone the objects they want to keep a copy of.
- Specified by:
process
in class DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
- Parameters:
input
- The input record.emitter
- The emitter to send the output to
generateKeys
public abstract Collection<org.apache.hadoop.util.bloom.Key> generateKeys(S input)
cleanup
public void cleanup(Emitter<Pair<String,org.apache.hadoop.util.bloom.BloomFilter>> emitter)
- Description copied from class:
DoFn
- Called during the cleanup of the MapReduce job this
DoFn
is
associated with. Subclasses may override this method to do appropriate
cleanup.
- Overrides:
cleanup
in class DoFn<S,Pair<String,org.apache.hadoop.util.bloom.BloomFilter>>
- Parameters:
emitter
- The emitter that was used for output
Copyright © 2014 The Apache Software Foundation. All Rights Reserved.