Accumulo supports on the fly lazy aggregation of data using Combiners. Aggregation is done at compaction and scan time. No lookup is done at insert time, which` greatly speeds up ingest.
Combiners are easy to use. You use the setiters command to configure a combiner for a table. Allowing a Combiner to apply to a whole column family is an interesting twist that gives the user great flexibility. The example below demonstrates this flexibility.
Shell - Apache Accumulo Interactive Shell - version: 1.5.0-SNAPSHOT - instance id: 863fc0d1-3623-4b6c-8c23-7d4fdb1c8a49 - - type 'help' for a list of available commands - user@instance> createtable perDayCounts user@instance perDayCounts> setiter -t perDayCounts -p 10 -scan -minc -majc -n daycount -class org.apache.accumulo.core.iterators.user.SummingCombiner TypedValueCombiner can interpret Values as a variety of number encodings (VLong, Long, or String) before combining ----------> set SummingCombiner parameter columns, <col fam>[:<col qual>]{,<col fam>[:<col qual>]} escape non aplhanum chars using %<hex>.: day ----------> set SummingCombiner parameter type, <VARNUM|LONG|STRING>: STRING user@instance perDayCounts> insert foo day 20080101 1 user@instance perDayCounts> insert foo day 20080101 1 user@instance perDayCounts> insert foo day 20080103 1 user@instance perDayCounts> insert bar day 20080101 1 user@instance perDayCounts> insert bar day 20080101 1 user@instance perDayCounts> scan bar day:20080101 [] 2 foo day:20080101 [] 2 foo day:20080103 [] 1
Implementing a new Combiner is a snap. Simply write some Java code that extends org.apache.accumulo.core.iterators.Combiner. A good place to look for examples is the org.apache.accumulo.core.iterators.user package. Also look at the example StatsCombiner.
To deploy a new aggregator, jar it up and put the jar in accumulo/lib/ext. To see an example look at README.combiner
If you would like to see what iterators a table has you can use the config command like in the following example.
user@instance perDayCounts> config -t perDayCounts -f iterator ---------+---------------------------------------------+----------------------------------------------------------- SCOPE | NAME | VALUE ---------+---------------------------------------------+----------------------------------------------------------- table | table.iterator.majc.daycount .............. | 10,org.apache.accumulo.core.iterators.user.SummingCombiner table | table.iterator.majc.daycount.opt.columns .. | day table | table.iterator.majc.daycount.opt.type ..... | STRING table | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator table | table.iterator.majc.vers.opt.maxVersions .. | 1 table | table.iterator.minc.daycount .............. | 10,org.apache.accumulo.core.iterators.user.SummingCombiner table | table.iterator.minc.daycount.opt.columns .. | day table | table.iterator.minc.daycount.opt.type ..... | STRING table | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator table | table.iterator.minc.vers.opt.maxVersions .. | 1 table | table.iterator.scan.daycount .............. | 10,org.apache.accumulo.core.iterators.user.SummingCombiner table | table.iterator.scan.daycount.opt.columns .. | day table | table.iterator.scan.daycount.opt.type ..... | STRING table | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator table | table.iterator.scan.vers.opt.maxVersions .. | 1 ---------+---------------------------------------------+-----------------------------------------------------------