Apache Accumulo Documentation : Distributed Tracing

It can be difficult to determine why some operations are taking longer than expected. For example, you may be looking up items with very low latency, but sometimes the lookups take much longer. Determining the cause of the delay is difficult because the system is distributed, and the typical lookup is fast.

To provide insight into what accumulo is doing during your scan, you can turn on tracing before you do your operation:

   DistributedTrace.enable(instance, zooReader, hostname, "myApplication");
   Trace scanTrace = Trace.on("client:scan");
   BatchScanner scanner = conn.createBatchScanner(...);
   // Configure your scanner
   for (Entry entry : scanner) {
   }
   Trace.off();

Accumulo has been instrumented to record the time that various operations take when tracing is turned on. The fact that tracing is enabled follows all the requests made on behalf of the user throughout the distributed infrastructure of accumulo, and across all threads of execution.

These time spans will be inserted into the trace accumulo table. You can browse recent traces from the accumulo monitor page. You can also read the trace table directly.

Tracing is supported in the shell. For example:

root@test> createtable test
root@test test> insert a b c d
root@test test> trace on              
root@test test> scan
a b:c []    d
root@test test> trace off
Waiting for trace information
Waiting for trace information
Waiting for trace information
Trace started at 2011/03/16 09:20:31.387
Time  Start  Service@Location       Name
 3355+0      shell@host2 shell:root
    1+1        shell@host2 client:listUsers
    1+1434     tserver@host2 getUserAuthorizations
    1+1434     shell@host2 client:getUserAuthorizations
   10+1550     shell@host2 scan
    9+1551       shell@host2 scan:location
    7+1552         shell@host2 client:startScan
    6+1553         tserver@host2 startScan
    5+1553           tserver@host2 tablet read ahead 11
    1+1559         shell@host2 client:closeScan
    1+1561     shell@host2 client:listUsers

Here we can see that the shell is getting the list of users (which is used for tab-completion) after every command. While unexpected, it is a fast operation. In fact, all the requests are very fast, and most of the time is spent waiting for the user to make a request while tracing is turned on.

Spans are added to the trace table asynchronously. The user may have to wait several seconds for all requests to complete before the trace information is complete.

You can extract the trace data out of the trace table. Each span is a stored as a column in a row named for the trace id. The following code will print out a trace:

String table = AccumuloConfiguration.getSystemConfiguration().get(Property.TRACE_TABLE);
Scanner scanner = shellState.connector.createScanner(table, auths);
scanner.setRange(new Range(new Text(Long.toHexString(scanTrace.traceId()))));
TraceDump.printTrace(scanner, new Printer() {
    void print(String line) {
        System.out.println(line);
    }
});