(Note: This document pertains only to the Java implementation Avro.)

1.0 Introduction

Recent work on improving the performance of "specific record" (AVRO-2090 and AVRO-2247 has highlighted the need for a benchmark that can be used to test the validity of alleged performance "improvements."

As a starting point, the Avro project has class called Perf (in the test source of the ipc subproject). Perf is a command-line tool contains close to 70 performance individual performance tests. These tests include tests for reading and writing primitive values, arrays and maps, plus tests for reading and writing records through all of the APIs (generic, specific, reflect).

When using Perf for some recent performance work, we encountered two problems. First, because it depends on build artifacts from across the Avro project, it can be tricky to invoke. Second, and more seriously, independent runs of the tests in Perf can vary in performance by as much as 40%. While typical variance is less than that, the variance is high enough that it makes it impossible to tell if a change in performance is simply this noise, or can be properly attributed to a proposed optimization.

This document addresses both problems, the usability problem in Section 2 and the variability issue in Section 3. Regarding the variability issue, as you will see, we haven't really been able to manage it in a fundamental manner. As suggested by Zoltan Frakas, we should look into porting Perf over to using the Java Microbenchmark Harness (JMH).

2.0 Invoking `Perf`

2.1 Simple invocation

Here is the easiest way we found to directly invoke Perf.

As mentioned in the Introduction, Perf is dependent upon build artifacts from some of the other Avro subprojects. When you invoke Perf, it should be invoked with your most recent build of those artifacts (assuming you're performance-testing your current work). We have found that the easiest way to ensure the proper artifacts are used is to use Maven to invoke Perf.

The recipe for using Maven in this way is simple. First, from the lang/java directoy, you need to build and install Avro:

mvn clean install

(You can add -DskipTests to the above command line if you don't need to run test suite.) When this is done, you need to change your working directory to the lang/java/ipc directory. From there, you can invoke Perf with the following command line:

mvn exec:java -Dexec.classpathScope=test -Dexec.mainClass=org.apache.avro.io.Perf -Dexec.args="..."

The exec.args string contains the arguments you want to pass through to the Perf.main function.

To speed up your edit-compile-test loop, you can do a selective build of Avro in addition to skipping tests:

mvn clean && mvn -pl "avro,compiler,maven-plugin,ipc" install -DskipTests

2.2 Using the run-perf.sh script

If you're using Perf, chances are that you want to compare the performance of a proposed optimization against the performance of a baseline (that baseline most likely being the current master branch of Avro). Generating this comparative data can be tedious if you're running Perf by hand. To relieve this tedium, you can use the run-perf.sh script instead (found in the share/test directory from the Avro top-level directory).

To use this script, you put different implementations of Avro onto different branches of your Avro git repository. One of these branches is designated the "baseline" branch and the others are the "treatment" branches. The script will run the baseline and all the treatments, and will compare generate a CSV file containing a comparison of the treatments against the baseline.

Running run-perf.sh --help will output a detailed manual-page for this script. Appendix A of this document contains sample invocations of this test script for different use cases.

NOTE: as mentioned in run-perf.sh --help, this script is designed to be run from the lang/java/ipc directory, which is the Maven project containing the Perf program.

3.0 Managing variance

As mentioned in the introduction, we tried a number of different mechanisms to reduce variance, including:

Varying org.apache.avro.io.perf.count, org.apache.io.perf.cycles, and org.apache.avro.io.perf.use-direct, as well as the number of times we run Perf.java within a single "run" of a test.
Taking the minimum times across runs, rather than the maximum times, using the second or third run as a baseline rather than the first, using statistical methods to eliminate outlying values.
Modified the code slightly, for example: starting the timer of a cycle after, rather than before, encoders or decoders are constructed; cacheing encoders and decoders; and reusing record objects during read tests rather than construct new ones for each record being read.
Using Docker's --cpuset-cpus flag to force the tests onto a single core.
Using a dedicated EC2 instance (c5d.2xlarge).

Of the above, the only change that made a significant difference was the last: in going from a laptop and desktop computer to a dedicated EC2 instances, we went from over 70 tests (out of 200) with a variance of 5% or more between runs to 35. As mentioned in the introduction, we should switch to a framework like JMH to attack this problem more fundamentally.

If you want to setup your own EC2 instance for testing, here's how we did it. We launched a dedicated EC2 c5d.2xlarge instance from the AWS console, using the "Amazon Linux 64-bit HVM GP2" AMI. We logged into this instance and ran the following commands to install Docker and Git (we did all our Avro build and testing inside the Docker image):

  sudo yum update
  sudo yum install -y git-all
  git config --global user.name "Your Name"
  git config --global user.email email-address-used@github.com
  git config --global core.editor emacs
  sudo install -y docker
  sudo usermod -aG docker ec2-user ## Need to log back in for this to take effect
  sudo service docker start

At this point you can checkout Avro and launch your Docker container:

  git clone https://github.com/apache/avro.git
  cd avro
  screen
  ./build.sh docker --args "--cpuset-cpus 2,6"

Note the use of screen here: executions of run-perf.sh can take a few hours, depending on the configuration. By running it inside of screen, you are protected from an SSH disconnection causing run-perf.sh to prematurely terminate.

The --args flag in the last command deserves some explanation. In general, the --args allows you to pass additional arguments to the docker run command executed inside build.sh. In this case, the --cpuset-cpus flag for docker tells docker to schedule the contianer exclusively on the listed (virtual) CPUs. We identified vCPUs 2 and 6 using the lscpu Linux command:

  [ec2-user@ip-0-0-0-0 avro]$ lscpu --extended
  CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
  0   0    0      0    0:0:0:0       yes
  1   0    0      1    1:1:1:0       yes
  2   0    0      2    2:2:2:0       yes
  3   0    0      3    3:3:3:0       yes
  4   0    0      0    0:0:0:0       yes
  5   0    0      1    1:1:1:0       yes
  6   0    0      2    2:2:2:0       yes
  7   0    0      3    3:3:3:0       yes

Notice that (v)CPUs 2 and 6 are both on core 2: it's sufficient to schedule the container on the same core, vs a single vCPU. One final tip: to confirm that your container is running on the expected CPUs, run top and then press the 1 key -- this will show you the load on each individual CPU.

Appendix A: Sample uses of run-perf.sh

A detailed explanation of run-perf.sh is printed when you give it the --help flag. To help you more quickly understand how to use run-perf.sh we present here a few examples of how we used it in our recent testing efforts.

To summarize, you invoke it as follows:

    ../../../share/test/run-perf.sh [--out-dir D] \
       [--perf-args STRING] [-Dkey=value]* [--] \
       [-Dkey=value]* branch_baseline[:name_baseline_run] \
       [-Dkey=value]* branch_1[:name_treatment_run_1] \
       ... 

       [-Dkey=value]* branch_n[:name_treatment_run_n]

The path given here is relative to the lang/java/ipc directory, which needs to be the current working directory when calling this script. The script executes multiple runs of testing. The first run is called the baseline run, the subsequent runs are the treatment runs. Each run consists of four identical executions of Perf.java. The running times for each Perf.java test are averaged to obtain the final running time for the test. For each treatment run, the final running times for each test are compared, as a percentage, to the running time for the test in the baseline run. These percentages are output in the file summary.csv.

The following invocation is what we used to measure the variance of Perf.java:

../../../share/test/run-perf.sh --out-dir ~/calibration \
    -Dorg.apache.avro.specific.use_custom_coders=true \
    AVRO-2269:baseline AVRO-2269:run1 AVRO-2269:run2 AVRO-2269:run3

In this invocation, the baseline run and all three treatment runs come from the same Git branch: AVRO-2269. We need to give a name to each run: in this case runs have been named "baseline"--the baseline run--and "run1", "run2", and "run3"--the treatment runs. Note that the name of the Git branch to be used for a run must always be provided, but the name for the run itself (e.g., "baseline") is optional. If a name for the run is not provided, then the name of the Git branch will be used as the name of the run. However, each run must have a unique name, so in this example we had to explicitly name the branches since all runs are on the same branch.

run-perf.sh uses Maven to invoke Perf.java. The -D flag is used to pass system properties to Maven, which in turn will pass them through to Perf.java. In the example above, we use this flag to turn on the custom-coders feature recently checked into Avro. Note that initial -D flags will be passed to all runs, while -D switches that come just before the name of Git branch of a run apply to only that run. In the case of the baseline run, which comes first, if you want to pass -D flags to just that run, then use the -- flag to indicate that all global parameters for run-perf.sh have been provided, followed by the -D flags you want to pass to only the baseline run.

Finally, note that run-perf.sh generates a lot of intermediate files as well as the final summary.csv file. Thus, it is recommended that the output of each execution of run-pref.sh is sent to a dedicated directory, provided by the --out-dir flag. If that directory does not exist, it will be created. (Observe that run-perf.sh outputs a file called command.txt containing the full command-line used to invoke it. This can be helpful if you run a lot of experiments and forget the detailed setup of some of them along the way.)

The next invocation is what we used to ensure that the new "custom coders" optimization for specific records does indeed improve performance:

../../../share/test/run-perf.sh --out-dir ~/retest-codegen \
    --perf-args "-Sf" \
    AVRO-2269:baseline \
    -Dorg.apache.avro.specific.use_custom_coders=true AVRO-2269:custom-coders

In this case, unlike the previous one, the -D flag that turns on the use of custom coders is applied specifically to the treatment run, and not globally. Also, since this flag only affects the Specific Record case, we use the --perf-args flag to pass additional arguments to Perf.java; in this case, the -Sf flag tells Perf.java to run just the specific-record related tests and not the entire test suite.

This last example shows how we checked the performance impact of two new feature-branches we've been developing:

../../../share/test/run-perf.sh --out-dir ~/new-branches \
    -Dorg.apache.avro.specific.use_custom_coders=true \
    AVRO-2269:baseline combined-opts full-refactor

In this case, once again, we turn on custom-coders for all runs. In this case, again, the Git branch AVRO-2269 is used for our baseline run. However, in this case, the treatment runs come from two other Git branches: combined-opts and full-refactor. We didn't provide run-names for these runs because the Git branch-name were fine to be used as run names (we explicitly named the first run "baseline" not because we had to, but because we like the convention of using that name).