Log4j 2 Asynchronous Loggers for Low-Latency Logging

Asynchronous Loggers for Low-Latency Logging

Asynchronous logging can improve your application's performance by executing the I/O operations in a separate thread. Log4j 2 makes a number of improvements in this area.

Asynchronous Loggers are a new addition to Log4j 2. Their aim is to return from the call to Logger.log to the application as soon as possible. You can choose between making all Loggers asynchronous or using a mixture of synchronous and asynchronous Loggers. Making all Loggers asynchronous will give the best performance, while mixing gives you more flexibility.
LMAX Disruptor technology. Asynchronous Loggers internally use the Disruptor, a lock-free inter-thread communication library, instead of queues, resulting in higher throughput and lower latency.
Asynchronous Appenders already existed in Log4j 1.x, but have been enhanced to flush to disk at the end of a batch (when the queue is empty). This produces the same result as configuring "immediateFlush=true", that is, all received log events are always available on disk, but is more efficient because it does not need to touch the disk on each and every log event. (Async Appenders use ArrayBlockingQueue internally and do not need the disruptor jar on the classpath.)
(For synchronous and asynchronous use) Fast File Appenders are an alternative to Buffered File Appenders. Under the hood, these new appenders use a ByteBuffer + RandomAccessFile instead of a BufferedOutputStream. In our testing this was about 20-200% faster. These appenders can also be used with synchronous loggers and will give the same performance benefits. Fast File Appenders do not need the disruptor jar on the classpath.

Trade-offs

Although asynchronous logging can give significant performance benefits, there are situations where you may want to choose synchronous logging. This section describes some of the trade-offs of asynchronous logging.

Benefits

Higher throughput. With an asynchronous logger your application can log messages at 6 - 68 times the rate of a synchronous logger.
Lower logging latency. Latency is the time it takes for a call to Logger.log to return. Asynchronous Loggers have consistently lower latency than synchronous loggers or even queue-based asynchronous appenders. Applications interested in low latency often care not only about average latency, but also about worst-case latency. Our performance comparison shows that Asynchronous Loggers also do better when comparing the maximum latency of 99% or even 99.99% of observations with other logging methods.
Prevent or dampen latency spikes during bursts of events. If the queue size is configured large enough to handle spikes, asynchronous logging will help prevent your application from falling behind (as much) during sudden bursts of activity.

Drawbacks

Error handling. If a problem happens during the logging process and an exception is thrown, it is less easy for an asynchronous logger or appender to signal this problem to the application. This can partly be alleviated by configuring an ExceptionHandler, but this may still not cover all cases. For this reason, if logging is part of your business logic, for example if you are using Log4j as an audit logging framework, we would recommend to synchronously log those audit messages. (Note that you can still combine them and use asynchronous logging for debug/trace logging in addition to synchronous logging for the audit trail.)

Making All Loggers Asynchronous

Requires disruptor-3.0.0.jar or higher on the classpath.

This is simplest to configure and gives the best performance. To make all loggers asynchronous, add the disruptor jar to the classpath and set the system property Log4jContextSelector to org.apache.logging.log4j.core.async.AsyncLoggerContextSelector.

By default, location is not passed to the I/O thread by asynchronous loggers. If one of your layouts or custom filters needs location information, you need to set "includeLocation=true" in the configuration of all relevant loggers, including the root logger.

A configuration that does not require location might look like:

<?xml version="1.0" encoding="UTF-8"?>

<!-- Don't forget to set system property 
-DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
     to make all loggers asynchronous. -->

<configuration status="WARN">
  <appenders>
    <!-- Async Loggers will auto-flush in batches, so switch off immediateFlush. -->
    <FastFile name="FastFile" fileName="async.log" immediateFlush="false" append="false">
      <PatternLayout>
        <pattern>%d %p %c{1.} [%t] %m %ex%n</pattern>
      </PatternLayout>
    </FastFile>
  </appenders>
  <loggers>
    <root level="info" includeLocation="false">
      <appender-ref ref="FastFile"/>
    </root>
  </loggers>
</configuration>

When AsyncLoggerContextSelector is used to make all loggers asynchronous, make sure to use normal <root> and <logger> elements in the configuration. The AsyncLoggerContextSelector will ensure that all loggers are asynchronous, using a mechanism that is different from what happens when you configure <asyncRoot> or <asyncLogger>. The latter elements are intended for mixing async with sync loggers. If you use both mechanisms together you will end up with two background threads, where your application passes the log message to thread A, which passes the message to thread B, which then finally logs the message to disk. This works, but there will be an unnecessary step in the middle.

There are a few system properties you can use to control aspects of the asynchronous logging subsystem. Some of these can be used to tune logging performance.

System Properties to configure all asynchronous loggers
System Property	Default Value	Description
AsyncLogger.ExceptionHandler	`null`	Fully qualified name of a class that implements the `com.lmax.disruptor.ExceptionHandler` interface. The class needs to have a public zero-argument constructor. If specified, this class will be notified when an exception occurs while logging the messages.
AsyncLogger.RingBufferSize	256 * 1024	Size (number of slots) in the RingBuffer used by the asynchronous logging subsystem. Make this value large enough to deal with bursts of activity. The minimum size is 128. The RingBuffer will be pre-allocated at first use and will never grow or shrink during the life of the system.
AsyncLogger.WaitStrategy	`Sleep`	Valid values: Block, Sleep, Yield. `Block` is a strategy that uses a lock and condition variable for the I/O thread waiting for log events. Block can be used when throughput and low-latency are not as important as CPU resource. Recommended for resource constrained/virtualised environments. `Sleep` is a strategy that initially spins, then uses a Thread.yield(), and eventually parks for the minimum number of nanos the OS and JVM will allow while the I/O thread is waiting for log events. Sleep is a good compromise between performance and CPU resource. This strategy has very low impact on the application thread, in exchange for some additional latency for actually getting the message logged. `Yield` is a strategy that uses a Thread.yield() for waiting for log events after an initially spinning. Yield is a good compromise between performance and CPU resource, but may use more CPU than Sleep in order to get the message logged to disk sooner.
log4j.Clock	`SystemClock`	Implementation of the `org.apache.logging.log4j.core.helpers.Clock` interface that is used for timestamping the log events when all loggers are asynchronous. By default, `System.currentTimeMillis` is called on every log event. `CachedClock` is an optimization where time stamps are generated from a clock that updates its internal time in a background thread once every millisecond, or every 1024 log events, whichever comes first. This reduces logging latency a little, at the cost of some precision in the logged time stamps. Unless you are logging many events, you may see "jumps" of 10-16 milliseconds between log time stamps. You can also specify a fully qualified class name of a custom class that implements the `Clock` interface.

Mixing Synchronous and Asynchronous Loggers

Requires disruptor-3.0.0.jar or higher on the classpath. There is no need to set system property "Log4jContextSelector" to any value.

Synchronous and asynchronous loggers can be combined in configuration. This gives you more flexibility at the cost of a slight loss in performance (compared to making all loggers asynchronous). Use the <asyncRoot> or <asyncLogger> configuration elements to specify the loggers that need to be asynchronous. The same configuration file can also contain <root> and <logger> elements for the synchronous loggers.

A configuration that mixes asynchronous loggers might look like:

<?xml version="1.0" encoding="UTF-8"?>

<!-- No need to set system property "Log4jContextSelector" to any value 
     when using <asyncLogger> or <asyncRoot>. -->

<configuration status="WARN">
  <appenders>
    <!-- Async Loggers will auto-flush in batches, so switch off immediateFlush. -->
    <FastFile name="FastFile" fileName="asyncWithLocation.log" 
              immediateFlush="false" append="false">
      <PatternLayout>
        <pattern>%d %p %class{1.} [%t] %location %m %ex%n</pattern>
      </PatternLayout>
    </FastFile>
  </appenders>
  <loggers>
    <!-- pattern layout actually uses location, so we need to include it -->
    <asyncLogger name="com.foo.Bar" level="trace" includeLocation="true">
      <appender-ref ref="FastFile"/>
    </asyncLogger>
    <root level="info" includeLocation="true">
      <appender-ref ref="FastFile"/>
    </root>
  </loggers>
</configuration>

There are a few system properties you can use to control aspects of the asynchronous logging subsystem. Some of these can be used to tune logging performance.

System Properties to configure mixed asynchronous and normal loggers
System Property	Default Value	Description
AsyncLoggerConfig.ExceptionHandler	`null`	Fully qualified name of a class that implements the `com.lmax.disruptor.ExceptionHandler` interface. The class needs to have a public zero-argument constructor. If specified, this class will be notified when an exception occurs while logging the messages.
AsyncLoggerConfig.RingBufferSize	256 * 1024	Size (number of slots) in the RingBuffer used by the asynchronous logging subsystem. Make this value large enough to deal with bursts of activity. The minimum size is 128. The RingBuffer will be pre-allocated at first use and will never grow or shrink during the life of the system.
AsyncLoggerConfig.WaitStrategy	`Sleep`	Valid values: Block, Sleep, Yield. `Block` is a strategy that uses a lock and condition variable for the I/O thread waiting for log events. Block can be used when throughput and low-latency are not as important as CPU resource. Recommended for resource constrained/virtualised environments. `Sleep` is a strategy that initially spins, then uses a Thread.yield(), and eventually parks for the minimum number of nanos the OS and JVM will allow while the I/O thread is waiting for log events. Sleep is a good compromise between performance and CPU resource. This strategy has very low impact on the application thread, in exchange for some additional latency for actually getting the message logged. `Yield` is a strategy that uses a Thread.yield() for waiting for log events after an initially spinning. Yield is a good compromise between performance and CPU resource, but may use more CPU than Sleep in order to get the message logged to disk sooner.

Location, location, location...

If one of the layouts is configured with a location-related attribute like HTML locationInfo, or one of the patterns %C or $class, %F or %file, %l or %location, %L or %line, %M or %method, Log4j will take a snapshot of the stack, and walk the stack trace to find the location information.

This is an expensive operation: 1.3 - 5 times slower for synchronous loggers. Synchronous loggers wait as long as possible before they take this stack snapshot. If no location is required, the snapshot will never be taken.

However, asynchronous loggers need to make this decision before passing the log message to another thread; the location information will be lost after that point. The performance impact of taking a stack trace snapshot is even higher for asynchronous loggers: logging with location is 4 - 20 times slower than without location. For this reason, asynchronous loggers and asynchronous appenders do not include location information by default.

You can override the default behaviour in your logger or asynchronous appender configuration by specifying includeLocation="true".

Asynchronous Logging Performance

The performance results below were all derived from running the PerfTest, MTPerfTest and PerfTestDriver classes which can be found in the Log4j 2 unit test source directory. All tests were done using the default settings (SystemClock and SleepingWaitStrategy). The methodology used was the same for all tests:

First, warm up the JVM by logging 200,000 log messages of 500 characters.
Repeat the warm-up 10 times, then wait 10 seconds for the I/O thread to catch up and buffers to drain.
Latency test: at less than saturation, measure how long a call to Logger.log takes. Pause for 10 microseconds * threadCount between measurements. Repeat this 5 million times, and measure average latency, latency of 99% of observations and 99.99% of observations.
Throughput test: measure how long it takes to execute 256 * 1024 / threadCount calls to Logger.log and express the result in messages per second.
Repeat the test 5 times and average the results.

Logging Throughput

The graph below compares the throughput of synchronous loggers, asynchronous appenders and asynchronous loggers. This is the total throughput of all threads together. In the test with 64 threads, asynchronous loggers are 12 times faster than asynchronous appenders, and 68 times faster than synchronous loggers.

Asynchronous loggers' throughput increases with the number of threads, whereas both synchronous loggers and asynchronous appenders have more or less constant throughput regardless of the number of threads that are doing the logging.

Asynchronous Throughput Comparison with Other Logging Packages

We also compared throughput of asynchronous loggers to the synchronous loggers and asynchronous appenders available in other logging packages, specifically log4j-1.2.17 and logback-1.0.10, with similar results. For asynchronous appenders, total logging throughput of all threads together remains roughly constant when adding more threads. Asynchronous loggers make more effective use of the multiple cores available on the machine in multi-threaded scenarios.

On Solaris 10 (64bit) with JDK1.7.0_06, 4-core Xeon X5570 dual CPU @2.93Ghz with hyperthreading switched on (16 virtual cores):

Throughput per thread in messages/second
Logger	1 thread	2 threads	4 threads	8 threads	16 threads	32 threads	64 threads
Log4j 2: Loggers all asynchronous	2,652,412	909,119	776,993	516,365	239,246	253,791	288,997
Log4j 2: Loggers mixed sync/async	2,454,358	839,394	854,578	597,913	261,003	216,863	218,937
Log4j 2: Async Appender	1,713,429	603,019	331,506	149,408	86,107	45,529	23,980
Log4j1: Async Appender	2,239,664	494,470	221,402	109,314	60,580	31,706	14,072
Logback: Async Appender	2,206,907	624,082	307,500	160,096	85,701	43,422	21,303
Log4j 2: Synchronous	273,536	136,523	67,609	34,404	15,373	7,903	4,253
Log4j1: Synchronous	326,894	105,591	57,036	30,511	13,900	7,094	3,509
Logback: Synchronous	178,063	65,000	34,372	16,903	8,334	3,985	1,967

On Windows 7 (64bit) with JDK1.7.0_11, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores):

Throughput per thread in messages/second
Logger	1 thread	2 threads	4 threads	8 threads	16 threads	32 threads
Log4j 2: Loggers all asynchronous	1,715,344	928,951	1,045,265	1,509,109	1,708,989	773,565
Log4j 2: Loggers mixed sync/async	571,099	1,204,774	1,632,204	1,368,041	462,093	908,529
Log4j 2: Async Appender	1,236,548	1,006,287	511,571	302,230	160,094	60,152
Log4j1: Async Appender	1,373,195	911,657	636,899	406,405	202,777	162,964
Logback: Async Appender	1,979,515	783,722	582,935	289,905	172,463	133,435
Log4j 2: Synchronous	281,250	225,731	129,015	66,590	34,401	17,347
Log4j1: Synchronous	147,824	72,383	32,865	18,025	8,937	4,440
Logback: Synchronous	149,811	66,301	32,341	16,962	8,431	3,610

Throughput of Logging With Location (includeLocation="true")

On Solaris 10 (64bit) with JDK1.7.0_06, 4-core Xeon X5570 dual CPU @2.93Ghz with hyperthreading switched off (8 virtual cores):

Throughput in log messages/second per thread
Logger (Log4j 2)	1 thread	2 threads	4 threads	8 threads
Loggers all asynchronous	75,862	88,775	80,240	68,077
Loggers mixed sync/async	61,993	66,164	55,735	52,843
Async Appender	47,033	52,426	50,882	36,905
Synchronous	31,054	33,175	29,791	23,628

As expected, logging location information has a large performance impact. Asynchronous loggers are 4 - 20 times slower, while synchronous loggers are 1.3 - 5 times slower. However, if you do need location information, asynchronous logging will still be faster than synchronous logging.

Latency

Latency tests are done by logging at less than saturation, measuring how long a call to Logger.log takes to return. After each call to Logger.log, the test waits for 10 microseconds * threadCount before continuing. Each thread logs 5 million messages.

All the latency measurements below are results of tests run on Solaris 10 (64bit) with JDK1.7.0_06, 4-core Xeon X5570 dual CPU @2.93Ghz with hyperthreading switched on (16 virtual cores).

Note that this is log-scale, not linear. The above graph compares the latency distributions of an asynchronous logger and a Log4j 1.2.17 Async Appender. This shows the latency of one thread during a test where 64 threads are logging in parallel. The test was run once for the async logger and once for the async appender.

Latency of a call to Logger.log() in nanoseconds
	Average latency		99% observations less than		99.99% observations less than
	1 thread	64 threads	1 thread	64 threads	1 thread	64 threads
Log4j 2: Loggers all async	677	4,135	1,638	4,096	8,192	16,128
Log4j 2: Loggers mixed sync/async	648	4,873	1,228	4,096	8,192	16,384
Log4j 2: Async Appender	2,423	2,117,722	4,096	67,108,864	16,384	268,435,456
Log4j1: Async Appender	1,562	1,781,404	4,096	109,051,904	16,384	268,435,456
Logback: Async Appender	2,123	2,079,020	3,276	67,108,864	14,745	268,435,456

The latency comparison graph below is also log-scale, and shows the average latency of asynchronous loggers and ArrayBlockingQueue-based asynchronous appenders in scenarios with more and more threads running in parallel. Up to 8 threads asynchronous appenders have comparable average latency, two or three times that of asynchronous loggers. With more threads, the average latency of asynchronous appenders is orders of magnitude larger than asynchronous loggers.

Applications interested in low latency often care not only about average latency, but also about worst-case latency. The graph below shows that asynchronous loggers also do better when comparing the maximum latency of 99.99% of observations with other logging methods. When increasing the number of threads the vast majority of latency measurements for asynchronous loggers stay in the 10-20 microseconds range where Asynchronous Appenders start experiencing many latency spikes in the 100 millisecond range, a difference of four orders of magnitude.

FileAppender vs. FastFileAppender

The appender comparison below was done with synchronous loggers.

On Windows 7 (64bit) with JDK1.7.0_11, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores):

Throughput per thread in messages/second
Appender	1 thread	2 threads	4 threads	8 threads
FastFileAppender	250,438	169,939	109,074	58,845
FileAppender	186,695	118,587	57,012	28,846
RollingFastFileAppender	278,369	213,176	125,300	63,103
RollingFileAppender	182,518	114,690	55,147	28,153

On Solaris 10 (64bit) with JDK1.7.0_06, 4-core dual Xeon X5570 CPU @2.93GHz with hyperthreading switched off (8 virtual cores):

Throughput per thread in messages/second
Appender	1 thread	2 threads	4 threads	8 threads
FastFileAppender	240,760	128,713	66,555	30,544
FileAppender	172,517	106,587	55,885	25,675
RollingFastFileAppender	228,491	135,355	69,277	32,484
RollingFileAppender	186,422	97,737	55,766	25,097

Under The Hood

Asynchronous Loggers are implemented using the LMAX Disruptor inter-thread messaging library. From the LMAX web site:

... using queues to pass data between stages of the system was introducing latency, so we focused on optimising this area. The Disruptor is the result of our research and testing. We found that cache misses at the CPU-level, and locks requiring kernel arbitration are both extremely costly, so we created a framework which has "mechanical sympathy" for the hardware it's running on, and that's lock-free.

LMAX Disruptor internal performance comparisons with java.util.concurrent.ArrayBlockingQueue can be found here.