Short introduction to log4j

"Ceki Gülcü" December 2000

This manual is based on the article "Log4j delivers control over logging" published in November 2000 edition of JavaWorld. However, the present article contains more detailed and up to date information.

Abstract

This article describes the log4j API, its unique features and design rationale. Log4j is an open source project based on the work of many authors. It allows the developer to control which log statements are output with arbitrary granularity. It is fully configurable at runtime using external configuration files. Best of all, log4j has a gentle learning curve. Beware: judging from user feedback, it is also quite addictive.

Introduction

Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U. SEMPER project decided to write its own tracing API. This was in early 1996. After countless enhancements, several incarnations and much work that API has evolved to become log4j, a popular logging package for Java. The package is distributed under the Apache Public License, a fully-fledged open source license certified by the open source initiative. The latest log4j version, including full-source code, class files and documentation can be found at http://jakarta.apache.org/log4j/.

Igor Poteryaev, an independent author, has ported log4j to the Python language. Bastiaan Bakker has initiated a C++ port. Their projects are unsurprisingly called log4p and log4cpp.

Inserting log statements into code is a low-tech method for debugging it. It may also be the only way because debuggers are not always available or applicable. This is usually the case for multi-threaded applications and distributed applications at large.

Experience indicated that logging was an important component of the development cycle. It offered several advantages. It could provide precise context about a run of the application. Once inserted into the code, the generation of logging output required no human intervention. Moreover, log output could be saved in persistent medium to be studied at a later time. In addition to its use in the development cycle, a sufficiently rich logging package could also be viewed as an auditing tool.

As Brian W. Kernigan and Rob Pike put it in their truely excellent book "The Practice of Programming"

  As personal choice, we tend not to use debuggers beyond getting a
  stack trace or the value of a variable or two. One reason is that it
  is easy to get lost in details of complicated data structures and
  control flow; we find stepping through a program less productive
  than thinking harder and adding output statements and self-checking
  code at critical places. Clicking over statements takes longer than
  scanning the output of judiciously-placed displays. It takes less
  time to decide where to put print statements than to single-step to
  the critical section of code, even assuming we know where that
  is. More important, debugging statements stay with the program;
  debugging sessions are transient.

Logging does have its drawbacks. It can slow down an application. If too verbose, it can cause scrolling blindness. To alleviate these concerns, log4j is designed to be fast and flexible. Since logging is rarely the main focus of an application, log4j API strives to be simple to understand and to use.

Categories, Appenders and Layouts

Log4j has three main components: categories, appenders and layouts. These three types of componets work together to enable developers to log messages according to message type and priority, and to control at runtime how these messages are formatted and where they are reported.

Category hierarchy

The first and foremost advantage of any logging API over plain System.out.println resides in its ability to disable certain log statements while allowing others to print unhindered. This capability assumes that the logging space, that is, the space of all possible logging statements, is categorized according to some developer-chosen criteria.

This observation led us to choose category as the central concept of the package. The category notion is embodied in the org.apache.log4j.Category class. Categories are named entities. Category names are case-sensitive. They follow the hierarchical naming rule:

Named Hierarchy
A category is said to be a parent of another category if its name followed by a dot is a prefix of the child category name.

For example, the category named "com.foo" is a parent of the category named "com.foo.Bar". Similarly, "java" is a parent of "java.util" and an ancestor of "java.util.Vector". This naming scheme should be familiar to Java developers.

The root category resides at the top of the category hierarchy. It is exceptional in two ways:

it always exists,
it cannot be retrieved by name.

Invoking the class static Category.getRoot method retrieves it. All other categories are instantiated and retrieved with the class static Category.getInstance method. This method takes the name of the desired category as a parameter. Some of the basic methods in the Category class are listed below.

  package org.apache.log4j;
  
  public Category class {
  
    // Creation & retrieval methods:
    public static Category getRoot();
    public static Category getInstance(String name);
    
    // printing methods:
    public void debug(Object message);
    public void info(Object message);
    public void warn(Object message);
    public void error(Object message);

    // generic printing method:
    public void log(Priority p, Object message);
}

Categories may be assigned priorities. The set of possible priorities, that is DEBUG, INFO, WARN, ERROR and FATAL are defined in the org.apache.log4j.Priority class. The rationale behind this seemingly restricted set is to promote the use of the more flexible category hierarchy rather than a static (even if large) set of priorities. One may however define one's own priorities by sub-classing the Priority class.

If a given category is not assigned a priority, then it inherits one from its closest ancestor with an assigned priority. More formally:

Priority Inheritence
The inherited priority for a given category C, is equal to the first non-null priority in the category hierarchy, starting at C.

To ensure that all categories can eventually inherit a priority, the root category always has an assigned priority.

Below are four tables with various assigned priority values and the resulting inherited priorities according to the above rule.

Category
name Assigned
priority Inherited
priority

root Proot Proot

X none Proot

X.Y none Proot

X.Y.Z none Proot

Example 1

Category
name Assigned
priority Inherited
priority

root Proot Proot

X Px Px

X.Y Pxy Pxy

X.Y.Z Pxyz Pxyz

Example 2

Category
name Assigned
priority Inherited
priority

root Proot Proot

X Px Px

X.Y none Px

X.Y.Z Pxyz Pxyz

Example 3

Category
name Assigned
priority Inherited
priority

root Proot Proot

X Px Px

X.Y none Px

X.Y.Z none Px

Example 4

Logging requests are made by invoking one of the printing methods of a category instance. These printing methods are debug, info, warn, error, fatal, and log. By definition, the printing method determines the priority of a logging request. For example, if c is a category instance, then the statement c.info("..") is a logging request of priority INFO.

A logging request is said to be enabled if its priority is higher than or equal to the priority of its category. Otherwise, the request is said to be disabled. A category without an assigned priority will inherit one from the hierarchy. This rule is summarized below.

Basic Selection Rule
A log statement of priority p in a category with inherited priority q, is enabled if p >= q.

Here is an example of this rule.

   // get a category instance named "com.foo"
   Category  cat = Category.getInstance("com.foo");

   // Now set its priority.
   cat.setPriority(Priority.INFO);

   Category barcat = Category.getInstance("com.foo.Bar");
 
   // This request is enabled, because WARN >= INFO.
   cat.warn("Low fuel level.");
  
   // This request is disabled, because DEBUG < INFO.
   cat.debug("Starting search for nearest gas station."); 
 
   // The category instance barcat, named "com.foo.Bar",
   // will inherit its priority from the category named 
   // "com.foo" Thus, the following request is enabled 
   // because INFO >= INFO.  
   barcat.info("Located nearest gas station."); 

   // This request is disabled, because DEBUG < INFO.
   barcat.debug("Exiting gas station search");

Calling the getInstance method with the same name will always return a reference to the exact same category object. Thus, it is possible to configure a category and then to retrieve the same instance somewhere else in the code without passing around references. Categories can be created and configured in any order. In particular, a parent category will find and link to its children even if it is instantiated after them.

Configuration of the log4j environment is typically done at application initialization. The preferred way is by reading a configuration file. This approach will be discussed shortly.

Log4j makes it easy to name categories by software component. This can be accomplished by statically instantiating a category in each class, with the category name equal to the fully qualified name of the class. This is a useful and straightforward method of defining categories. As the log output bears the name of the generating category, this naming strategy makes it easy to identify the origin of a log message. However, this is only one possible, albeit common, strategy for naming categories. Log4j does not restrict the possible set of categories. The developer is free to name the categories as desired.

Nevertheless, naming categories after the class where they are defined seems to be the best strategy known so far.

Appenders and Layouts

The ability to selectively enable or disable logging requests based on their category is only part of the picture. Log4j allows logging requests to print to multiple destinations. In log4j speak, an output destination is called an appender. Currently, appenders exist for the console, files, GUI components, remote socket servers, NT Event Loggers, and remote UNIX Syslog daemons. It is also possible to log asyncronously.

A category may refer to multiple appenders. The addAppender method adds an appender to a given category. Each enabled logging request for a given category will be forwarded to all the appenders in that category as well as the appenders higher in the hierarchy. In other words, appenders are inherited additively from the category hierarchy. For example, if a console appender is added to the root category, then all enabled logging requests will at least print on the console. If in addition a file appender is added to a category, say C, then enabled logging requests for C and C's children will print on a file and on the console. It is possible to override this default behavior so that appender accumulation is no longer additive by setting the additivity flag to false.

The rules govening appender addivity are sumarized below.

Appender Additivity
The output of a log statement of category C will go to all the appenders in C and its parents. This is the meaning of the term "appender additivity".
However, if a parent category of C, say P, has the additivity flag set to false, then C's output will be directed to all the appenders in C and it's parents upto and including P but not the appenders in any of the parents of P.
Categories have their additivity flag set to true by default.

The table below shows an example:

Category
Name Added
Appenders Additivity
Flag Output Targets Comment
root A1 not applicable A1 The root category is anonymous but can be accessed with the Category.getRoot() method.
x A-x1, A-x2 true A1, A-x1, A-x2 Appenders in root are added to appenders in "x".
x.y none true A1, A-x1, A-x2 Appenders of "x" and root.
x.y.z A-xyz1 true A1, A-x1, A-x2, A-xyz1 Appenders in "x.y.z", "x" and root.
security A-sec false A-sec No appender accumulation as the additivity flag is set to false. flag.
security.access none true A-sec Only appenders of "security" as the additivity flag in "security" is set to false.

Category Name	Added Appenders	Additivity Flag	Output Targets	Comment
root	A1	not applicable	A1	The root category is anonymous but can be accessed with the Category.getRoot() method.
x	A-x1, A-x2	true	A1, A-x1, A-x2	Appenders in root are added to appenders in "x".
x.y	none	true	A1, A-x1, A-x2	Appenders of "x" and root.
x.y.z	A-xyz1	true	A1, A-x1, A-x2, A-xyz1	Appenders in "x.y.z", "x" and root.
security	A-sec	false	A-sec	No appender accumulation as the additivity flag is set to `false`. flag.
security.access	none	true	A-sec	Only appenders of "security" as the additivity flag in "security" is set to `false`.

More often than not, users wish to customize not only the output destination but also the output format. This is accomplished by associating a layout with an appender. The layout is responsible for formatting the logging request according to the user's wishes, whereas an appender takes care of sending the formatted output to its destination. The PatternLayout, part of the standard log4j distribution, lets the user specify the output format according to conversion patterns similar to the C language printf function.

For example, the PatternLayout with the conversion pattern "%r [%t] %-5p %c - %m%n" will output something akin to:

176 [main] INFO  org.foo.Bar - Located nearest gas station.

The first field is the number of milliseconds elapsed since the start of the program. The second field is the thread making the log request. The third field is the priority of the log statement. The fourth field is the name of the category associated with the log request. The text after the '-' is the message of the statement.

Just as importantly, log4j will render the content of the log message according to user specified criteria. For example, if you frequently need to log Oranges, an object type used in your current project, then you can register an OrangeRenderer that will be invoked whenever an orange needs to be logged.

Object rendering follows the class hierarchy. For example, assuming oranges are fruits, if you register an FruitRenderer, all fruits including oranges will be rendered by the FruitRenderer, unless of course you registered an orange specific OrangeRenderer.

Object renderers have to implement the ObjectRenderer interface.

Configuration

Inserting log requests into the application code requires a fair amount of planning and effort. Observation shows that approximately 4 percent of code is dedicated to logging. Consequently, even moderately sized applications will have thousands of logging statements embedded within their code. Given their number, it becomes imperative to manage these log statements without the need to modify them manually.

The log4j environment is fully configurable programatically. However, it is far more flexible to configure log4j using configuration files. Currently, configuration files can be written in XML or in Java properties (key=value) format.

Let us give a taste of how this is done with the help of an imaginary application MyApp that uses log4j.

 import com.foo.Bar;

 // Import log4j classes.
 import org.apache.log4j.Category;
 import org.apache.log4j.BasicConfigurator;
 
 public class MyApp {

   // Define a static category variable so that it references the
   // Category instance named "MyApp".
   static Category cat = Category.getInstance(MyApp.class.getName());

   public static void main(String[] args) {

     // Set up a simple configuration that logs on the console.
     BasicConfigurator.configure();

     cat.info("Entering application.");
     Bar bar = new Bar();
     bar.doIt();
     cat.info("Exiting application.");    
   }
 }

MyApp begins by importing log4j related classes. It then defines a static category variable with the name MyApp which happens to be the fully qualified name of the class.

MyApp uses the Bar class defined in the package com.foo.

 package com.foo;
 import org.apache.log4j.Category;
 
 public class Bar {
   static Category cat = Category.getInstance(Bar.class.getName());
  
   public void doIt() {
     cat.debug("Did it again!");    
   }
 }

The invocation of the BasicConfigurator.configure method creates a rather simple log4j setup. This method is hardwired to add to the root category a FileAppender printing on the console. The output will be formatted using a PatternLayout set to the pattern "%-4r [%t] %-5p %c %x - %m%n".

Note that by default, the root category is assigned to Priority.DEBUG.

The output of MyApp is:

0    [main] INFO  MyApp  - Entering application.
36   [main] DEBUG com.foo.Bar  - Did it again!
51   [main] INFO  MyApp  - Exiting application.

The figure below depicts the object diagram of MyApp after just having called the BasicConfigurator.configure method.

The MyApp class configures log4j by invoking BasicConfigurator.configure method. Other classes only need to import the org.apache.log4j.Category class, retrieve the categories they wish to use, and log away.

The previous example always outputs the same log information. Fortunately, it is easy to modify MyApp so that the log output can be controlled at run-time. Here is a slightly modified version.

 import com.foo.Bar;

 import org.apache.log4j.Category;
 import org.apache.log4j.PropertyConfigurator;
 
 public class MyApp {

   static Category cat = Category.getInstance(MyApp.class.getName());

   public static void main(String[] args) {

     
     // BasicConfigurator replaced with PropertyConfigurator.
     PropertyConfigurator.configure(args[0]);

     cat.info("Entering application.");
     Bar bar = new Bar();
     bar.doIt();
     cat.info("Exiting application.");    
   }
 }

This version of MyApp instructs PropertyConfigurator to parse a configuration file and set up logging accordingly.

Here is a sample configuration file that results in exactly same output as the previous BasicConfigurator based example.

# Set root category priority to DEBUG and its only appender to A1.
log4j.rootCategory=DEBUG, A1
  
# A1 is set to be a FileAppender which outputs to System.out. 
log4j.appender.A1=org.apache.log4j.FileAppender
log4j.appender.A1.File=System.out
  
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Please note that if you copy and paste the examples, then result is likely to include trailing spaces on some lines. These trailing spaces are not trimmed out but interpreted by the PropertyConfigurator. By the time you read this article the problem should be corrected.

Suppose we are no longer interested in seeing the output of any component belonging to the com.foo package. The following configuration file shows one possible way of achieving this.

log4j.rootCategory=DEBUG, A1
log4j.appender.A1=org.apache.log4j.FileAppender
log4j.appender.A1.File=System.out  
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
  
# Print the date in ISO 8601 format
log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
  
# Print only messages of priority WARN or above in the package com.foo.
log4j.category.com.foo=WARN

The output of MyApp configured with this file is shown below.

2000-09-07 14:07:41,508 [main] INFO  MyApp - Entering application.
2000-09-07 14:07:41,529 [main] INFO  MyApp - Exiting application.

As the category com.foo.Bar does not have an assigned priority, it inherits its priority from com.foo, which was set to WARN in the configuration file. The log statement from the Bar.doIt method has the priority DEBUG, lower than the category priority WARN. Consequently, doIt's log request is suppressed.

Here is another configuration file that uses multiple appenders.

log4j.rootCategory=debug, stdout, R

log4j.appender.stdout=org.apache.log4j.FileAppender
log4j.appender.stdout.File=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

# Pattern to output the caller's file name and line number.
log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n

log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=example.log

log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1

log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n

Calling the enhanced MyApp with the this configuration file will output the following on the console.

 INFO [main] (MyApp2.java:12) - Entering application.
DEBUG [main] (Bar.java:8) - Doing it again!
 INFO [main] (MyApp2.java:15) - Exiting application.

In addition, as the root category has been allocated a second appender, output will also be directed to the example.log file. This file will be rolled over when it reaches 100KB. When roll-over occurs, the old version of example.log is automatically moved to example.log.1.

Note that to obtain these different logging behaviors we did not need to recompile code. We could just as easily have logged to a UNIX Syslog daemon, redirected all com.foo output to an NT Event logger, or forwarded logging events to a remote log4j server, which would log according to local server policy, for example by forwarding the log event to a second log4j server.

Nested Diagnostic Contexts

Most real-world systems have to deal with multiple clients simultaneously. In a typical multi-threaded implementation of such a system, different threads will handle different clients. Logging is especially well suited to trace and debug complex distributed applications. A common approach to differentiate the logging output of one client from another is to instantiate a new separate category for each client. This promotes the proliferation of categories and increases the management overhead of logging.

A lighter technique is to uniquely stamp each log request initiated from the same client interaction. Neil Harrison described this method in the book "Patterns for Logging Diagnostic Messages," in Pattern Languages of Program Design 3, edited by R. Martin, D. Riehle, and F. Buschmann (Addison-Wesley, 1997).

To uniquely stamp each request, the user pushes contextual information into the NDC, the abbreviation of Nested Diagnostic Context. The NDC class is shown below.

  public class NDC { 
    // Used when printing the diagnostic 
    public static String get(); 
 
    // Remove the top of the context from the NDC.
    public static String pop(); 
 
    // Add diagnostic context for the current thread.
    public static void push(String message); 
 
    // Remove the diagnostic context for this thread.                      
    public static void remove(); 
  }

The NDC is managed per thread as a stack of contextual information. Note that all methods of the org.apache.log4j.NDC class are static. Assuming that NDC printing is turned on, every time a log request is made, the appropriate log4j component will include the entire NDC stack for the current thread in the log output. This is done without the intervention of the user, who is responsible only for placing the correct information in the NDC by using the push and pop methods at a few well-defined points in the code. In contrast, the per-client category approach commands extensive changes in the code.

To illustrate this point, let us take the example of a servlet delivering content to numerous clients. The servlet can build the NDC at the very beginning of the request before executing other code. The contextual information can be the client's host name and other information inherent to the request, typically information contained in cookies. Hence, even if the servlet is serving multiple clients simultaneously, the logs initiated by the same code, i.e. belonging to the same category, can still be distinguished because each client request will have a different NDC stack. Contrast this with the complexity of passing a freshly instantiated category to all code exercised during the client's request.

Nevertheless, some sophisticated applications, such as virtual hosting web servers, must log differently depending on the virtual host context and also depending on the software component issuing the request. Recent log4j releases support multiple hierarchy trees. This enhancement allows each virtual host to possess its own copy of the category hierarchy.

Performance

One of the often-cited arguments against logging is its computational cost. This is a legitimate concern as even moderately sized applications can generate thousands of log requests. Much effort was spent measuring and tweaking logging performance. Log4j claims to be fast and flexible: speed first, flexibility second.

The user should be aware of the following performance issues.

Logging performance when logging is turned off.
When logging is turned off entirely or just for a set of priorities, the cost of a log request consists of a method invocation plus an integer comparison. On a 233 MHz Pentium II machine this cost is typically in the 5 to 50 nano-second range.
However, The method invocation involves the "hidden" cost of parameter construction.
For example, for some category cat, writing,
```
     cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
    
```
incurs the cost of constructing the message parameter, i.e. converting both integer i and entry[i] to a String, and concatenating intermediate strings, regardless of whether the message will be logged or not. This cost of parameter construction can be quite high and it depends on the size of the parameters involved.
To avoid the parameter construction cost write:
```
   
      if(cat.isDebugEnabled() {
        cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
      }
   
```
This will not incur the cost of parameter construction if debugging is disabled. On the other hand, if the category is debug-enabled, it will incur twice the cost of evaluating whether the category is enabled or not: once in debugEnabled and once in debug. This is an insignificant overhead because evaluating a category takes about 1% of the time it takes to actually log.
In log4j, logging requests are made to instances of the Category class. Category is a class and not an interface. This measurably reduces the cost of method invocation at the cost of some flexibility.
Certain users resort to pre-processing or compile-time techinques to compile out all log statements. This leads to perfect performance efficiency with respect to logging. However, since the resulting application binary does not contain any log statements, logging cannot be turned on for that binary. In my opinion this is a disproportionate price to pay in exchange for a small performance gain.
The performance of deciding whether to log or not to log when logging is turned on.
This is essentially the performance of walking the category hierarchy. When logging is turned on, log4j still needs to compare the priority of the log request with the priority of the request category. However, categories may not have an assigned priority; they can inherit them from the category hierarchy. Thus, before inheriting a priority, the category may need to search its ancestors.
There has been a serious effort to make this hierarchy walk to be as fast as possible. For example, child categories link only to their existing ancestors. In the BasicConfigurator example shown earlier, the category named com.foo.Bar is linked directly to the root category, thereby circumventing the nonexistent com or com.foo categories. This significantly improves the speed of the walk, especially in "sparse" hierarchies.
The typical cost of walking the hierarchy is in the range of 5 to 15 microseconds, again on a 233MHz Pentium II machine.
Actual logging.
This is the cost of formatting the log output and sending it to its target destination. Here again, a serious effort was made to make layouts (formatters) perform as quickly as possible. The same is true for appenders. The typical cost of actually logging is about 100 to 300 microseconds. See org.apache.log4.performance.Logging for actual figures.

Although log4j has many features, its first design goal was speed. Some log4j components have been rewritten many times to improve performance. Nevertheless, contributors frequently come up with new optimizations. You should be pleased to know that when configured with the SimpleLayout performance tests have shown log4j to log as quickly as System.out.println.

Upcoming logging API from Sun

Sun has begun a community process, JSR47, in order to define a logging API for java. This API will require JDK version 1.4. The JSR47 specification has recently became ready for public review.

The JSR47 API and log4j are quite similar at the architectural level. JSR47 API will support a hierarchical name space, one of the central features of log4j. On the other hand, log4j has many useful features that are missing in JSR47. Given the momentum behind log4j, in my partisan opinion, JSR47 is likely to be obsolete by the time it is launced. Log4j is written by the people who use it daily, not by a committee.

Conclusions

Log4j is a popular logging package written in Java. One of its distinctive features is the notion of inheritance in categories. Using a category hierarchy it is possible to control which log statements are output at arbitrary granularity. This helps reduce the volume of logged output and minimize the cost of logging.

One of the advantages of the log4j API is its manageability. Once the log statements have been inserted into the code, they can be controlled with configuration files. They can be selectively enabled or disabled, and sent to different and multiple output targets in user-chosen formats. The log4j package is designed so that log statements can remain in shipped code without incurring a heavy performance cost.

Acknowledgements

Many thanks to N. Asokan for reviewing the article. He is also one of the originators of the category concept. I am indebted to Nelson Minar of PopularPower for encouraging me to write this article. He has also made many useful suggestions and corrections to this article.

Log4j is the result of a collective effort. My special thanks go to all the authors who have contributed to the project. Without exception, the best features in the package have all originated in the user community.