This article describes the log4j API, its unique features and design rationale. Log4j is an open source project based on the work of many authors. It allows the developer to control which log statements are output with arbitrary granularity. It is fully configurable at runtime using external configuration files. Best of all, log4j has a gentle learning curve. Beware: judging from user feedback, it is also quite addictive.
Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U. SEMPER project decided to write its own tracing API. This was in early 1996. After countless enhancements, several incarnations and much work that API has evolved to become log4j, a popular logging package for Java. The package is distributed under the Apache Public License, a fully-fledged open source license certified by the open source initiative. The latest log4j version, including full-source code, class files and documentation can be found at http://jakarta.apache.org/log4j/.
Igor Poteryaev, an independent author, has ported log4j to the Python language. Bastiaan Bakker has initiated a C++ port. Their projects are unsurprisingly called log4p and log4cpp.
Inserting log statements into code is a low-tech method for debugging it. It may also be the only way because debuggers are not always available or applicable. This is usually the case for multi-threaded applications and distributed applications at large.
Experience indicated that logging was an important component of the development cycle. It offered several advantages. It could provide precise context about a run of the application. Once inserted into the code, the generation of logging output required no human intervention. Moreover, log output could be saved in persistent medium to be studied at a later time. In addition to its use in the development cycle, a sufficiently rich logging package could also be viewed as an auditing tool.
As Brian W. Kernigan and Rob Pike put it in their truely excellent book "The Practice of Programming"
As personal choice, we tend not to use debuggers beyond getting a stack trace or the value of a variable or two. One reason is that it is easy to get lost in details of complicated data structures and control flow; we find stepping through a program less productive than thinking harder and adding output statements and self-checking code at critical places. Clicking over statements takes longer than scanning the output of judiciously-placed displays. It takes less time to decide where to put print statements than to single-step to the critical section of code, even assuming we know where that is. More important, debugging statements stay with the program; debugging sessions are transient.
Logging does have its drawbacks. It can slow down an application. If too verbose, it can cause scrolling blindness. To alleviate these concerns, log4j is designed to be fast and flexible. Since logging is rarely the main focus of an application, log4j API strives to be simple to understand and to use.
Log4j has three main components: categories, appenders and layouts. These three types of componets work together to enable developers to log messages according to message type and priority, and to control at runtime how these messages are formatted and where they are reported.
The first and foremost advantage of any logging API over plain
System.out.println
resides in its ability to disable
certain log statements while allowing others to print unhindered. This
capability assumes that the logging space, that is, the space of all
possible logging statements, is categorized according to some
developer-chosen criteria.
This observation led us to choose category as the central
concept of the package. The category notion is embodied in the
org.apache.log4j.Category
class. Categories are named entities. Category names are
case-sensitive. They follow the hierarchical naming rule:
|
For example, the category named "com.foo"
is a parent
of the category named "com.foo.Bar"
. Similarly,
"java"
is a parent of "java.util"
and an
ancestor of "java.util.Vector"
. This naming scheme
should be familiar to Java developers.
The root category resides at the top of the category hierarchy. It is exceptional in two ways:
Invoking the class static Category.getRoot method retrieves it. All other categories are instantiated and retrieved with the class static Category.getInstance method. This method takes the name of the desired category as a parameter. Some of the basic methods in the Category class are listed below.
package org.apache.log4j; public Category class { // Creation & retrieval methods: public static Category getRoot(); public static Category getInstance(String name); // printing methods: public void debug(Object message); public void info(Object message); public void warn(Object message); public void error(Object message); // generic printing method: public void log(Priority p, Object message); }
Categories may be assigned priorities. The set of possible
priorities, that is
DEBUG,
INFO,
WARN,
ERROR and
FATAL
are defined in the org.apache.log4j.Priority
class. The rationale behind this seemingly restricted set
is to promote the use of the more flexible category hierarchy rather
than a static (even if large) set of priorities. One may however
define one's own priorities by sub-classing the Priority
class.
If a given category is not assigned a priority, then it inherits one from its closest ancestor with an assigned priority. More formally:
|
To ensure that all categories can eventually inherit a priority, the root category always has an assigned priority.
Below are four tables with various assigned priority values and the resulting inherited priorities according to the above rule.
|
| ||||||||||||||||||||||||||||||
|
|
Logging requests are made by invoking one of the printing methods
of a category instance. These printing methods are
debug
,
info
,
warn
,
error
,
fatal
,
and
log
.
By definition, the printing method determines the
priority of a logging request. For example, if c
is a
category instance, then the statement c.info("..")
is a
logging request of priority INFO.
A logging request is said to be enabled if its priority is higher than or equal to the priority of its category. Otherwise, the request is said to be disabled. A category without an assigned priority will inherit one from the hierarchy. This rule is summarized below.
|
Here is an example of this rule.
// get a category instance named "com.foo" Category cat = Category.getInstance("com.foo"); // Now set its priority. cat.setPriority(Priority.INFO); Category barcat = Category.getInstance("com.foo.Bar"); // This request is enabled, because WARN >= INFO. cat.warn("Low fuel level."); // This request is disabled, because DEBUG < INFO. cat.debug("Starting search for nearest gas station."); // The category instance barcat, named "com.foo.Bar", // will inherit its priority from the category named // "com.foo" Thus, the following request is enabled // because INFO >= INFO. barcat.info("Located nearest gas station."); // This request is disabled, because DEBUG < INFO. barcat.debug("Exiting gas station search");
Calling the getInstance
method with the same name will
always return a reference to the exact same category object. Thus, it
is possible to configure a category and then to retrieve the same
instance somewhere else in the code without passing around
references. Categories can be created and configured in any order. In
particular, a parent category will find and link to its children even
if it is instantiated after them.
Configuration of the log4j environment is typically done at application initialization. The preferred way is by reading a configuration file. This approach will be discussed shortly.
Log4j makes it easy to name categories by software component. This can be accomplished by statically instantiating a category in each class, with the category name equal to the fully qualified name of the class. This is a useful and straightforward method of defining categories. As the log output bears the name of the generating category, this naming strategy makes it easy to identify the origin of a log message. However, this is only one possible, albeit common, strategy for naming categories. Log4j does not restrict the possible set of categories. The developer is free to name the categories as desired.
Nevertheless, naming categories after the class where they are defined seems to be the best strategy known so far.
The ability to selectively enable or disable logging requests based on their category is only part of the picture. Log4j allows logging requests to print to multiple destinations. In log4j speak, an output destination is called an appender. Currently, appenders exist for the console, files, GUI components, remote socket servers, NT Event Loggers, and remote UNIX Syslog daemons. It is also possible to log asyncronously.
A category may refer to multiple appenders. The addAppender
method adds an appender to a given category.
Each enabled logging
request for a given category will be forwarded to all the appenders in
that category as well as the appenders higher in the hierarchy. In
other words, appenders are inherited additively from the category
hierarchy. For example, if a console appender is added to the root
category, then all enabled logging requests will at least print on the
console. If in addition a file appender is added to a category, say
C, then enabled logging requests for C and
C's children will print on a file and on the
console. It is possible to override this default behavior so that
appender accumulation is no longer additive by setting
the additivity flag to false
.
The rules govening appender addivity are sumarized below.
|
The table below shows an example:
Category Name | Added Appenders | Additivity Flag | Output Targets | Comment |
---|---|---|---|---|
root | A1 | not applicable | A1 | The root category is anonymous but can be accessed with the Category.getRoot() method. |
x | A-x1, A-x2 | true | A1, A-x1, A-x2 | Appenders in root are added to appenders in "x". |
x.y | none | true | A1, A-x1, A-x2 | Appenders of "x" and root. |
x.y.z | A-xyz1 | true | A1, A-x1, A-x2, A-xyz1 | Appenders in "x.y.z", "x" and root. |
security | A-sec | false | A-sec | No appender accumulation as the additivity flag is set to
false . flag.
|
security.access | none | true | A-sec | Only
appenders of "security" as the additivity flag in "security" is
set to false .
|
More often than not, users wish to customize not only the output
destination but also the output format. This is accomplished by
associating a layout with an appender. The layout is
responsible for formatting the logging request according to the user's
wishes, whereas an appender takes care of sending the formatted output
to its destination. The PatternLayout
, part of the
standard log4j distribution, lets the user specify the output format
according to conversion patterns similar to the C language
printf
function.
For example, the PatternLayout with the conversion pattern "%r [%t] %-5p %c - %m%n" will output something akin to:
176 [main] INFO org.foo.Bar - Located nearest gas station.
The first field is the number of milliseconds elapsed since the start of the program. The second field is the thread making the log request. The third field is the priority of the log statement. The fourth field is the name of the category associated with the log request. The text after the '-' is the message of the statement.
Just as importantly, log4j will render the content of the log
message according to user specified criteria. For example, if you
frequently need to log Oranges
, an object type used in
your current project, then you can register an
OrangeRenderer
that will be invoked whenever an orange
needs to be logged.
Object rendering follows the class hierarchy. For example, assuming
oranges are fruits, if you register an FruitRenderer
, all
fruits including oranges will be rendered by the
FruitRenderer
, unless of course you registered an orange
specific OrangeRenderer
.
Object renderers have to implement the ObjectRenderer interface.
Inserting log requests into the application code requires a fair amount of planning and effort. Observation shows that approximately 4 percent of code is dedicated to logging. Consequently, even moderately sized applications will have thousands of logging statements embedded within their code. Given their number, it becomes imperative to manage these log statements without the need to modify them manually.
The log4j environment is fully configurable programatically. However, it is far more flexible to configure log4j using configuration files. Currently, configuration files can be written in XML or in Java properties (key=value) format.
Let us give a taste of how this is done with the help of an
imaginary application MyApp
that uses log4j.
import com.foo.Bar; // Import log4j classes. import org.apache.log4j.Category; import org.apache.log4j.BasicConfigurator; public class MyApp { // Define a static category variable so that it references the // Category instance named "MyApp". static Category cat = Category.getInstance(MyApp.class.getName()); public static void main(String[] args) { // Set up a simple configuration that logs on the console. BasicConfigurator.configure(); cat.info("Entering application."); Bar bar = new Bar(); bar.doIt(); cat.info("Exiting application."); } }
MyApp
begins by importing log4j related classes. It
then defines a static category variable with the name
MyApp
which happens to be the fully qualified name of the
class.
MyApp
uses the Bar
class defined in the
package com.foo
.
package com.foo; import org.apache.log4j.Category; public class Bar { static Category cat = Category.getInstance(Bar.class.getName()); public void doIt() { cat.debug("Did it again!"); } }
The invocation of the BasicConfigurator.configure method creates a rather simple log4j setup. This method is hardwired to add to the root category a FileAppender printing on the console. The output will be formatted using a PatternLayout set to the pattern "%-4r [%t] %-5p %c %x - %m%n".
Note that by default, the root category is assigned to
Priority.DEBUG
.
The output of MyApp is:
0 [main] INFO MyApp - Entering application. 36 [main] DEBUG com.foo.Bar - Did it again! 51 [main] INFO MyApp - Exiting application.
The figure below depicts the object diagram of MyApp
after just having called the BasicConfigurator.configure
method.
The MyApp
class configures log4j by invoking
BasicConfigurator.configure
method. Other classes only
need to import the org.apache.log4j.Category
class, retrieve the
categories they wish to use, and log away.
The previous example always outputs the same log information.
Fortunately, it is easy to modify MyApp
so that the log
output can be controlled at run-time. Here is a slightly modified
version.
import com.foo.Bar; import org.apache.log4j.Category; import org.apache.log4j.PropertyConfigurator; public class MyApp { static Category cat = Category.getInstance(MyApp.class.getName()); public static void main(String[] args) { // BasicConfigurator replaced with PropertyConfigurator. PropertyConfigurator.configure(args[0]); cat.info("Entering application."); Bar bar = new Bar(); bar.doIt(); cat.info("Exiting application."); } }This version of
MyApp
instructs
PropertyConfigurator
to parse a configuration file and
set up logging accordingly.
Here is a sample configuration file that results in exactly same
output as the previous BasicConfigurator
based example.
# Set root category priority to DEBUG and its only appender to A1. log4j.rootCategory=DEBUG, A1 # A1 is set to be a FileAppender which outputs to System.out. log4j.appender.A1=org.apache.log4j.FileAppender log4j.appender.A1.File=System.out # A1 uses PatternLayout. log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
Please note that if you copy and paste the examples, then result is likely to include trailing spaces on some lines. These trailing spaces are not trimmed out but interpreted by the PropertyConfigurator. By the time you read this article the problem should be corrected.
Suppose we are no longer interested in seeing the output of any
component belonging to the com.foo
package. The following
configuration file shows one possible way of achieving this.
log4j.rootCategory=DEBUG, A1 log4j.appender.A1=org.apache.log4j.FileAppender log4j.appender.A1.File=System.out log4j.appender.A1.layout=org.apache.log4j.PatternLayout # Print the date in ISO 8601 format log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c - %m%n # Print only messages of priority WARN or above in the package com.foo. log4j.category.com.foo=WARNThe output of
MyApp
configured with this file is shown below.
2000-09-07 14:07:41,508 [main] INFO MyApp - Entering application. 2000-09-07 14:07:41,529 [main] INFO MyApp - Exiting application.
As the category com.foo.Bar
does not have an assigned
priority, it inherits its priority from com.foo
, which
was set to WARN in the configuration file. The log statement from the
Bar.doIt
method has the priority DEBUG, lower than the
category priority WARN. Consequently, doIt
's log request
is suppressed.
Here is another configuration file that uses multiple appenders.
log4j.rootCategory=debug, stdout, R log4j.appender.stdout=org.apache.log4j.FileAppender log4j.appender.stdout.File=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout # Pattern to output the caller's file name and line number. log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=example.log log4j.appender.R.MaxFileSize=100KB # Keep one backup file log4j.appender.R.MaxBackupIndex=1 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%nCalling the enhanced MyApp with the this configuration file will output the following on the console.
INFO [main] (MyApp2.java:12) - Entering application. DEBUG [main] (Bar.java:8) - Doing it again! INFO [main] (MyApp2.java:15) - Exiting application.
In addition, as the root category has been allocated a second
appender, output will also be directed to the example.log
file. This file will be rolled over when it reaches 100KB. When
roll-over occurs, the old version of example.log
is
automatically moved to example.log.1
.
Note that to obtain these different logging behaviors we did not
need to recompile code. We could just as easily have logged to a UNIX
Syslog daemon, redirected all com.foo
output to an NT
Event logger, or forwarded logging events to a remote log4j server,
which would log according to local server policy, for example by
forwarding the log event to a second log4j server.
Most real-world systems have to deal with multiple clients simultaneously. In a typical multi-threaded implementation of such a system, different threads will handle different clients. Logging is especially well suited to trace and debug complex distributed applications. A common approach to differentiate the logging output of one client from another is to instantiate a new separate category for each client. This promotes the proliferation of categories and increases the management overhead of logging.
A lighter technique is to uniquely stamp each log request initiated from the same client interaction. Neil Harrison described this method in the book "Patterns for Logging Diagnostic Messages," in Pattern Languages of Program Design 3, edited by R. Martin, D. Riehle, and F. Buschmann (Addison-Wesley, 1997).
To uniquely stamp each request, the user pushes contextual information into the NDC, the abbreviation of Nested Diagnostic Context. The NDC class is shown below.
public class NDC { // Used when printing the diagnostic public static String get(); // Remove the top of the context from the NDC. public static String pop(); // Add diagnostic context for the current thread. public static void push(String message); // Remove the diagnostic context for this thread. public static void remove(); }
The NDC is managed per thread as a stack of contextual
information. Note that all methods of the org.apache.log4j.NDC
class are static. Assuming that NDC printing is turned on, every time
a log request is made, the appropriate log4j component will include
the entire NDC stack for the current thread in the log
output. This is done without the intervention of the user, who is
responsible only for placing the correct information in the NDC by
using the push
and pop
methods at a few
well-defined points in the code. In contrast, the per-client category
approach commands extensive changes in the code.
To illustrate this point, let us take the example of a servlet delivering content to numerous clients. The servlet can build the NDC at the very beginning of the request before executing other code. The contextual information can be the client's host name and other information inherent to the request, typically information contained in cookies. Hence, even if the servlet is serving multiple clients simultaneously, the logs initiated by the same code, i.e. belonging to the same category, can still be distinguished because each client request will have a different NDC stack. Contrast this with the complexity of passing a freshly instantiated category to all code exercised during the client's request.
Nevertheless, some sophisticated applications, such as virtual hosting web servers, must log differently depending on the virtual host context and also depending on the software component issuing the request. Recent log4j releases support multiple hierarchy trees. This enhancement allows each virtual host to possess its own copy of the category hierarchy.
One of the often-cited arguments against logging is its computational cost. This is a legitimate concern as even moderately sized applications can generate thousands of log requests. Much effort was spent measuring and tweaking logging performance. Log4j claims to be fast and flexible: speed first, flexibility second.
The user should be aware of the following performance issues.
When logging is turned off entirely or just for a set of priorities, the cost of a log request consists of a method invocation plus an integer comparison. On a 233 MHz Pentium II machine this cost is typically in the 5 to 50 nano-second range.
However, The method invocation involves the "hidden" cost of parameter construction.
For example, for some category cat
, writing,
cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));incurs the cost of constructing the message parameter, i.e. converting both integer
i
and entry[i]
to a String, and concatenating intermediate strings,
regardless of whether the message will be logged or not.
This cost of parameter construction can be quite high and it
depends on the size of the parameters involved.
To avoid the parameter construction cost write:
if(cat.isDebugEnabled() { cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i])); }
This will not incur the cost of parameter
construction if debugging is disabled. On the other hand, if
the category is debug-enabled, it will incur twice the cost of
evaluating whether the category is enabled or not: once
in debugEnabled
and once in
debug
. This is an insignificant
overhead because evaluating a category takes about 1%
of the time it takes to actually log.
In log4j, logging requests are made to instances of the Category class. Category is a class and not an interface. This measurably reduces the cost of method invocation at the cost of some flexibility.
Certain users resort to pre-processing or compile-time techinques to compile out all log statements. This leads to perfect performance efficiency with respect to logging. However, since the resulting application binary does not contain any log statements, logging cannot be turned on for that binary. In my opinion this is a disproportionate price to pay in exchange for a small performance gain.
This is essentially the performance of walking the category hierarchy. When logging is turned on, log4j still needs to compare the priority of the log request with the priority of the request category. However, categories may not have an assigned priority; they can inherit them from the category hierarchy. Thus, before inheriting a priority, the category may need to search its ancestors.
There has been a serious effort to make this hierarchy walk to
be as fast as possible. For example, child categories link only to
their existing ancestors. In the BasicConfigurator
example shown earlier, the category named com.foo.Bar
is
linked directly to the root category, thereby circumventing the
nonexistent com
or com.foo
categories. This
significantly improves the speed of the walk, especially in "sparse"
hierarchies.
The typical cost of walking the hierarchy is in the range of 5 to 15 microseconds, again on a 233MHz Pentium II machine.
This is the cost of formatting the log output and sending it to its target destination. Here again, a serious effort was made to make layouts (formatters) perform as quickly as possible. The same is true for appenders. The typical cost of actually logging is about 100 to 300 microseconds. See org.apache.log4.performance.Logging for actual figures.
Although log4j has many features, its first design goal was speed.
Some log4j components have been rewritten many times to improve
performance. Nevertheless, contributors frequently come up with new
optimizations. You should be pleased to know that when configured with
the SimpleLayout
performance tests have shown log4j to log as quickly as
System.out.println
.
Sun has begun a community process, JSR47, in order to define a logging API for java. This API will require JDK version 1.4. The JSR47 specification has recently became ready for public review.
The JSR47 API and log4j are quite similar at the architectural level. JSR47 API will support a hierarchical name space, one of the central features of log4j. On the other hand, log4j has many useful features that are missing in JSR47. Given the momentum behind log4j, in my partisan opinion, JSR47 is likely to be obsolete by the time it is launced. Log4j is written by the people who use it daily, not by a committee.
Log4j is a popular logging package written in Java. One of its distinctive features is the notion of inheritance in categories. Using a category hierarchy it is possible to control which log statements are output at arbitrary granularity. This helps reduce the volume of logged output and minimize the cost of logging.
One of the advantages of the log4j API is its manageability. Once the log statements have been inserted into the code, they can be controlled with configuration files. They can be selectively enabled or disabled, and sent to different and multiple output targets in user-chosen formats. The log4j package is designed so that log statements can remain in shipped code without incurring a heavy performance cost.
Log4j is the result of a collective effort. My special thanks go to all the authors who have contributed to the project. Without exception, the best features in the package have all originated in the user community.