Recent work on improving the performance of "specific record" (AVRO-2090 and AVRO-2247 has highlighted the need for a benchmark that can be used to test the validity of alleged performance "improvements."
As a starting point, the Avro project has class called Perf
(in the test source of the ipc
subproject). Perf
is a command-line tool contains close to 70 performance individual performance tests. These tests include tests for reading and writing primitive values, arrays and maps, plus tests for reading and writing records through all of the APIs (generic, specific, reflect).
When using Perf
for some recent performance work, we encountered two problems. First, because it depends on build artifacts from across the Avro project, it can be tricky to invoke. Second, and more seriously, independent runs of the tests in Perf
can vary in performance by as much as 40%. While typical variance is less than that, the variance is high enough that it makes it impossible to tell if a change in performance is simply this noise, or can be properly attributed to a proposed optimization.
This document addresses both problems, the usability problem in Section 2 and the variability issue in Section 3. Regarding the variability issue, as you will see, we haven't really been able to manage it in a fundamental manner. As suggested by Zoltan Frakas, we should look into porting Perf
over to using the Java Microbenchmark Harness (JMH).
Perf
Here is the easiest way we found to directly invoke Perf
.
As mentioned in the Introduction, Perf
is dependent upon build artifacts from some of the other Avro subprojects. When you invoke Perf
, it should be invoked with your most recent build of those artifacts (assuming you're performance-testing your current work). We have found that the easiest way to ensure the proper artifacts are used is to use Maven to invoke Perf
.
The recipe for using Maven in this way is simple. First, from the lang/java
directoy, you need to build and install Avro:
mvn clean install
(You can add -DskipTests
to the above command line if you don't need to run test suite.) When this is done, you need to change your working directory to the lang/java/ipc
directory. From there, you can invoke Perf
with the following command line:
mvn exec:java -Dexec.classpathScope=test -Dexec.mainClass=org.apache.avro.io.Perf -Dexec.args="..."
The exec.args
string contains the arguments you want to pass through to the Perf.main
function.
To speed up your edit-compile-test loop, you can do a selective build of Avro in addition to skipping tests:
mvn clean && mvn -pl "avro,compiler,maven-plugin,ipc" install -DskipTests
If you're using Perf
, chances are that you want to compare the performance of a proposed optimization against the performance of a baseline (that baseline most likely being the current master branch of Avro). Generating this comparative data can be tedious if you're running Perf
by hand. To relieve this tedium, you can use the run-perf.sh
script instead (found in the share/test
directory from the Avro top-level directory).
To use this script, you put different implementations of Avro onto different branches of your Avro git repository. One of these branches is designated the "baseline" branch and the others are the "treatment" branches. The script will run the baseline and all the treatments, and will compare generate a CSV file containing a comparison of the treatments against the baseline.
Running run-perf.sh --help
will output a detailed manual-page for this script. Appendix A of this document contains sample invocations of this test script for different use cases.
NOTE: as mentioned in run-perf.sh --help
, this script is designed to be run from the lang/java/ipc
directory, which is the Maven project containing the Perf
program.
org.apache.avro.io.perf.count
, org.apache.io.perf.cycles
, and org.apache.avro.io.perf.use-direct
, as well as the number of times we run Perf.java
within a single "run" of a test.
--cpuset-cpus
flag to force the tests onto a single core.
c5d.2xlarge
).
If you want to setup your own EC2 instance for testing, here's how we did it. We launched a dedicated EC2 c5d.2xlarge
instance from the AWS console, using the "Amazon Linux 64-bit HVM GP2" AMI. We logged into this instance and ran the following commands to install Docker and Git (we did all our Avro build and testing inside the Docker image):
sudo yum update sudo yum install -y git-all git config --global user.name "Your Name" git config --global user.email email-address-used@github.com git config --global core.editor emacs sudo install -y docker sudo usermod -aG docker ec2-user ## Need to log back in for this to take effect sudo service docker startAt this point you can checkout Avro and launch your Docker container:
git clone https://github.com/apache/avro.git cd avro screen ./build.sh docker --args "--cpuset-cpus 2,6"Note the use of
screen
here: executions of run-perf.sh
can take a few hours, depending on the configuration. By running it inside of screen
, you are protected from an SSH disconnection causing run-perf.sh
to prematurely terminate.
The --args
flag in the last command deserves some explanation. In general, the --args
allows you to pass additional arguments to the docker run
command executed inside build.sh
. In this case, the --cpuset-cpus
flag for docker
tells docker to schedule the contianer exclusively on the listed (virtual) CPUs. We identified vCPUs 2 and 6 using the lscpu
Linux command:
[ec2-user@ip-0-0-0-0 avro]$ lscpu --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE 0 0 0 0 0:0:0:0 yes 1 0 0 1 1:1:1:0 yes 2 0 0 2 2:2:2:0 yes 3 0 0 3 3:3:3:0 yes 4 0 0 0 0:0:0:0 yes 5 0 0 1 1:1:1:0 yes 6 0 0 2 2:2:2:0 yes 7 0 0 3 3:3:3:0 yesNotice that (v)CPUs 2 and 6 are both on core 2: it's sufficient to schedule the container on the same core, vs a single vCPU. One final tip: to confirm that your container is running on the expected CPUs, run
top
and then press the 1
key -- this will show you the load on each individual CPU.
A detailed explanation of run-perf.sh
is printed when you give it the --help
flag. To help you more quickly understand how to use run-perf.sh
we present here a few examples of how we used it in our recent testing efforts.
To summarize, you invoke it as follows:
../../../share/test/run-perf.sh [--out-dir D] \ [--perf-args STRING] [-Dkey=value]* [--] \ [-Dkey=value]* branch_baseline[:name_baseline_run] \ [-Dkey=value]* branch_1[:name_treatment_run_1] \ ...The path given here is relative to the
[-Dkey=value]* branch_n[:name_treatment_run_n]
lang/java/ipc
directory, which needs to be the current working directory when calling this script. The script executes multiple runs of testing. The first run is called the baseline run, the subsequent runs are the treatment runs. Each run consists of four identical executions of Perf.java
. The running times for each Perf.java
test are averaged to obtain the final running time for the test. For each treatment run, the final running times for each test are compared, as a percentage, to the running time for the test in the baseline run. These percentages are output in the file summary.csv
.
The following invocation is what we used to measure the variance of Perf.java
:
../../../share/test/run-perf.sh --out-dir ~/calibration \ -Dorg.apache.avro.specific.use_custom_coders=true \ AVRO-2269:baseline AVRO-2269:run1 AVRO-2269:run2 AVRO-2269:run3In this invocation, the baseline run and all three treatment runs come from the same Git branch:
AVRO-2269
. We need to give a name to each run: in this case runs have been named "baseline"--the baseline run--and "run1", "run2", and "run3"--the treatment runs. Note that the name of the Git branch to be used for a run must always be provided, but the name for the run itself (e.g., "baseline") is optional. If a name for the run is not provided, then the name of the Git branch will be used as the name of the run. However, each run must have a unique name, so in this example we had to explicitly name the branches since all runs are on the same branch.
run-perf.sh
uses Maven to invoke Perf.java
. The -D
flag is used to pass system properties to Maven, which in turn will pass them through to Perf.java
. In the example above, we use this flag to turn on the custom-coders feature recently checked into Avro. Note that initial -D
flags will be passed to all runs, while -D
switches that come just before the name of Git branch of a run apply to only that run. In the case of the baseline run, which comes first, if you want to pass -D
flags to just that run, then use the --
flag to indicate that all global parameters for run-perf.sh
have been provided, followed by the -D
flags you want to pass to only the baseline run.
Finally, note that run-perf.sh
generates a lot of intermediate files as well as the final summary.csv
file. Thus, it is recommended that the output of each execution of run-pref.sh
is sent to a dedicated directory, provided by the --out-dir
flag. If that directory does not exist, it will be created. (Observe that run-perf.sh
outputs a file called command.txt
containing the full command-line used to invoke it. This can be helpful if you run a lot of experiments and forget the detailed setup of some of them along the way.)
The next invocation is what we used to ensure that the new "custom coders" optimization for specific records does indeed improve performance:
../../../share/test/run-perf.sh --out-dir ~/retest-codegen \ --perf-args "-Sf" \ AVRO-2269:baseline \ -Dorg.apache.avro.specific.use_custom_coders=true AVRO-2269:custom-codersIn this case, unlike the previous one, the
-D
flag that turns on the use of custom coders is applied specifically to the treatment run, and not globally. Also, since this flag only affects the Specific Record case, we use the --perf-args
flag to pass additional arguments to Perf.java
; in this case, the -Sf
flag tells Perf.java
to run just the specific-record related tests and not the entire test suite.
This last example shows how we checked the performance impact of two new feature-branches we've been developing:
../../../share/test/run-perf.sh --out-dir ~/new-branches \ -Dorg.apache.avro.specific.use_custom_coders=true \ AVRO-2269:baseline combined-opts full-refactorIn this case, once again, we turn on custom-coders for all runs. In this case, again, the Git branch
AVRO-2269
is used for our baseline run. However, in this case, the treatment runs come from two other Git branches: combined-opts
and full-refactor
. We didn't provide run-names for these runs because the Git branch-name were fine to be used as run names (we explicitly named the first run "baseline" not because we had to, but because we like the convention of using that name).
Although we didn't state it before, in preparing for a run, run-perf.sh
will checkout the Git branch to be used for the run and use mvn install
to build and install it. It does this for each branch, so the invocation just given will checkout and build three different branches during its overall execution. (As an optimization, if one run uses the same branch as the previous run, then the branch is not checked-out or rebuilt between runs.)