This example uses mapreduce and accumulo to compute word counts for a set of documents. This is accomplished using a map-only mapreduce job and a accumulo table with combiners. To run this example you will need a directory in HDFS containing text files. The accumulo readme will be used to show how to run this example. $ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README $ hadoop fs -ls /user/username/wc Found 1 items -rw-r--r-- 2 username supergroup 9359 2009-07-15 17:54 /user/username/wc/Accumulo.README The first part of running this example is to create a table with a combiner for the column family count. $ ./bin/accumulo shell -u username -p password Shell - Apache Accumulo Interactive Shell - version: 1.5.0-SNAPSHOT - instance name: instance - instance id: 00000000-0000-0000-0000-000000000000 - - type 'help' for a list of available commands - username@instance> createtable wordCount username@instance wordCount> setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 10 -t wordCount -majc -minc -scan SummingCombiner interprets Values as Longs and adds them together. A variety of encodings (variable length, fixed length, or string) are available ----------> set SummingCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: false ----------> set SummingCombiner parameter columns, [:]{,[:]} escape non-alphanum chars using %.: count ----------> set SummingCombiner parameter lossy, if true, failed decodes are ignored. Otherwise combiner will error on failed decodes (default false): : false ----------> set SummingCombiner parameter type, : STRING username@instance wordCount> quit After creating the table, run the word count map reduce job. $ bin/tool.sh lib/examples-simple*[^c].jar org.apache.accumulo.examples.simple.mapreduce.WordCount instance zookeepers /user/username/wc wordCount -u username -p password 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1 11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0% 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0% 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452 After the map reduce job completes, query the accumulo table to see word counts. $ ./bin/accumulo shell -u username -p password username@instance> table wordCount username@instance wordCount> scan -b the the count:20080906 [] 75 their count:20080906 [] 2 them count:20080906 [] 1 then count:20080906 [] 1 there count:20080906 [] 1 these count:20080906 [] 3 this count:20080906 [] 6 through count:20080906 [] 1 time count:20080906 [] 3 time. count:20080906 [] 1 to count:20080906 [] 27 total count:20080906 [] 1 tserver, count:20080906 [] 1 tserver.compaction.major.concurrent.max count:20080906 [] 1 ... Another example to look at is org.apache.accumulo.examples.simple.mapreduce.UniqueColumns. This example computes the unique set of columns in a table and shows how a map reduce job can directly read a tables files from HDFS.