Apache Accumulo Documentation : Combiners

Accumulo supports on the fly lazy aggregation of data using Combiners. Aggregation is done at compaction and scan time. No lookup is done at insert time, which` greatly speeds up ingest.

Combiners are easy to use. You use the setiters command to configure a combiner for a table. Allowing a Combiner to apply to a whole column family is an interesting twist that gives the user great flexibility. The example below demonstrates this flexibility.


Shell - Apache Accumulo Interactive Shell
- version: 1.4.0
- instance id: 863fc0d1-3623-4b6c-8c23-7d4fdb1c8a49
- 
- type 'help' for a list of available commands
-
user@instance> createtable perDayCounts
user@instance perDayCounts> setiter -t perDayCounts -p 10 -scan -minc -majc -n daycount -class org.apache.accumulo.core.iterators.user.SummingCombiner
TypedValueCombiner can interpret Values as a variety of number encodings (VLong, Long, or String) before combining
----------> set SummingCombiner parameter columns, <col fam>[:<col qual>]{,<col fam>[:<col qual>]} escape non aplhanum chars using %<hex>.: day
----------> set SummingCombiner parameter type, <VARNUM|LONG|STRING>: STRING
user@instance perDayCounts> insert foo day 20080101 1
user@instance perDayCounts> insert foo day 20080101 1
user@instance perDayCounts> insert foo day 20080103 1
user@instance perDayCounts> insert bar day 20080101 1
user@instance perDayCounts> insert bar day 20080101 1
user@instance perDayCounts> scan
bar day:20080101 []    2
foo day:20080101 []    2
foo day:20080103 []    1

Implementing a new Combiner is a snap. Simply write some Java code that extends org.apache.accumulo.core.iterators.Combiner. A good place to look for examples is the org.apache.accumulo.core.iterators.user package. Also look at the example StatsCombiner.

To deploy a new aggregator, jar it up and put the jar in accumulo/lib/ext. To see an example look at README.combiner

If you would like to see what iterators a table has you can use the config command like in the following example.

user@instance perDayCounts> config -t perDayCounts -f iterator
---------+---------------------------------------------+-----------------------------------------------------------
SCOPE    | NAME                                        | VALUE
---------+---------------------------------------------+-----------------------------------------------------------
table    | table.iterator.majc.daycount .............. | 10,org.apache.accumulo.core.iterators.user.SummingCombiner
table    | table.iterator.majc.daycount.opt.columns .. | day
table    | table.iterator.majc.daycount.opt.type ..... | STRING
table    | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
table    | table.iterator.majc.vers.opt.maxVersions .. | 1
table    | table.iterator.minc.daycount .............. | 10,org.apache.accumulo.core.iterators.user.SummingCombiner
table    | table.iterator.minc.daycount.opt.columns .. | day
table    | table.iterator.minc.daycount.opt.type ..... | STRING
table    | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
table    | table.iterator.minc.vers.opt.maxVersions .. | 1
table    | table.iterator.scan.daycount .............. | 10,org.apache.accumulo.core.iterators.user.SummingCombiner
table    | table.iterator.scan.daycount.opt.columns .. | day
table    | table.iterator.scan.daycount.opt.type ..... | STRING
table    | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
table    | table.iterator.scan.vers.opt.maxVersions .. | 1
---------+---------------------------------------------+-----------------------------------------------------------