Chapter 5. Common mistakes, problems and anti-patterns

Table of Contents

Violating the javax.activation.DataSource contract
Issues that magically disappear
The OM-inside-OMDataSource anti-pattern
Weak version
Strong version

This chapter presents some of the common mistakes and problems people face when writing code using Axiom, as well as anti-patterns that should be avoided.

Violating the javax.activation.DataSource contract

When working with binary (base64) content, it is sometimes necessary to write a custom DataSource implementation to wrap binary data that is available in a different form (and for which Axiom or the Java Activation Framework has no out-of-the-box data source implementation). Data sources are also sometimes (but less frequently) used in conjunction with OMSourcedElement and OMDataSource.

The documentation of the DataSource is very clear on the expected behavior of the getInputStream method:

/**
 * This method returns an InputStream representing
 * the data and throws the appropriate exception if it can
 * not do so. Note that a new InputStream object must be
 * returned each time this method is called, and the stream must be
 * positioned at the beginning of the data.
 *
 * @return an InputStream
 */
public InputStream getInputStream() throws IOException;

A common mistake is to implement the data source in a way that makes getInputStream destructive. Consider the implementation shown in Example 5.1, “DataSource implementation that violates the interface contract”[2]. It is clear that this data source can only be read once and that any subsequent call to getInputStream will return an already closed input stream.

Example 5.1. DataSource implementation that violates the interface contract

public class InputStreamDataSource implements DataSource {
    private final InputStream is;

    public InputStreamDataSource(InputStream is) {
        this.is = is;
    }

    public String getContentType() {
        return "application/octet-stream";
    }

    public InputStream getInputStream() throws IOException {
        return is;
    }

    public String getName() {
        return null;
    }

    public OutputStream getOutputStream() throws IOException {
        throw new UnsupportedOperationException();
    }
}

What makes this mistake so vicious is that very likely it will not cause problems immediately. The reason is that Axiom is optimized to read the data only when necessary, which in most cases means only once! However, in some cases it is unavoidable to read the data several times. When that happens, the broken DataSource implementation will cause problems that may be extremely hard to debug.

Imagine for example[3] that the implementation shown above is used to produce an MTOM message. At first this will work without any problems because the data source is read only once when serializing the message. If later on the MTOM threshold feature is enabled, the broken implementation will (in the worst case) cause the corresponding MIME parts to be empty or (in the best case) trigger an I/O error because Axiom attempts to read from an already closed stream. The reason for this is that when an MTOM threshold is set, Axiom reads the data source twice: once to determine if its size exceeds the threshold[4] and once during serialization of the message.

Issues that magically disappear

Quite frequently users post messages on the Axiom related mailing lists about issues that seem to disappear by magic when they try to debug them. The reason why this can happen is simple. As explained earlier, Axiom uses deferred building, but at the same time does its best to hide that from the user, so that he doesn't need to worry about whether the object model has already been built or not. On the other hand, when serializing the object model to XML or when requesting a pull parser (XMLStreamReader) from a node, the code paths taken may be radically different depending on whether or not the corresponding part of the tree has already been built. This is especially true when caching is disabled.

While the end result should be the same in all cases, it is also clear that in some circumstances an issue that occurs with an incompletely built tree may disappear if there is something that causes Axiom to build the rest of the object model. What is important to understand is that the something may be as trivial as a call to the toString method of an OMNode! The fact that adding System.out.println statements or logging instructions is a common debugging technique then explains why issues sometimes seem to magically disappear during debugging.

Finally, it should be noted that inspecting an OMNode in a debugger also causes a call to the toString method on that object. This means that by just clicking on something in the Variables window of your debugger, you may completely change the state of the process that is being debugged!

The OM-inside-OMDataSource anti-pattern

Weak version

OMDataSource objects are used in conjunction with OMSourcedElement to build Axiom object model instances that contain information items that are represented using a framework or API other than Axiom. Wrapping this foreign data in an OMDataSource and adding it to the Axiom object model using an OMSourcedElement in most cases avoids the conversion of the data to the native Axiom object model[5]. The OMDataSource contract requires the implementation to support two different ways of providing the data, both relying on StAX:

  • The implementation must be able to provide a pull parser (XMLStreamReader) from which the infoset can be read.

  • The data source must be able to serialize the infoset to an XMLStreamWriter (push).

For the consumer of an event based representation of an XML infoset, it is in general easier to work in pull mode. That is the reason why StAX has gained popularity over push based approaches such as SAX. On the other hand for a producer such as an OMDataSource implementation, it's exactly the other way round: it is far easier to serialize an infoset to an XMLStreamWriter (push) than to build an XMLStreamReader from which a consumer can read (pull) events.

Experience indeed shows that the most challenging part in creating an OMDataSource implementation is to write the getReader method. To avoid that difficulty some implementations simply build an Axiom tree and return the XMLStreamReader provided by OMElement#getXMLStreamReader(). For example, some ADB (Axis2 Data Binding) versions use the following code[6]:

Example 5.2. OMDataSource#getReader() implementation used by ADB

public XMLStreamReader getReader() throws XMLStreamException {
    MTOMAwareOMBuilder mtomAwareOMBuilder = new MTOMAwareOMBuilder();
    serialize(mtomAwareOMBuilder);
    return mtomAwareOMBuilder.getOMElement().getXMLStreamReader();
}

The MTOMAwareOMBuilder class referenced by this code is a special implementation of XMLStreamWriter that builds an Axiom tree from the sequence of events send to it. The code than uses this Axiom tree to get the XMLStreamReader implementation. While this is a functionally correct implementation of the getReader method, it is not a good solution from a performance perspective and also contradicts some of the ideas on which Axiom is based, namely that the object model should only be built when necessary.

Indeed, it should not be necessary to build an intermediary tree when requesting a pull parser from the OMDataSource because all the required information is already present in the ADB beans. Worse, if the OMSourcedElement is expanded, the object model instance will be built twice: once by the getReader and once by Axiom itself!

While constructing an Axiom tree inside the getReader method is clearly an anti-pattern, at least in the case of ADB it is not as bad as it seems at first glance. The reason is that in the case which is the most relevant for performance (which is sending a Web Service response prepared using ADB), Axiom will only invoke the serialize method and not make use of getReader.

[Note]

At the time of writing there is no general solution available to avoid the weak version of the OM-inside-OMDataSource anti-pattern in cases where it would be far too difficult to build a proper XMLStreamReader implementation. Future versions of Axiom may implement a solution that avoids the complexity of implementing XMLStreamReader without too much performance trade-offs.

Strong version

There is also a stronger version of the anti-pattern which consists in implementing the serialize method by building an Axiom tree and then serializing the tree to the XMLStreamWriter. Except for very special cases, there is no valid reason whatsoever to do this! To see why this is so, consider the two possible cases:

  1. The OMDataSource already implements the getReader method in a proper way, i.e. without building an intermediary Axiom tree. To properly implement serialize, it is then sufficient to pull the events from the reader returned by a call to getReader and copy them to the XMLStreamReader. The easiest and most efficient way to do this is using StreamingOMSerializer:

    Example 5.3. Proper implementation of the OMDataSource#serialize method

    public void serialize(XMLStreamWriter xmlWriter)
            throws XMLStreamException {
        StreamingOMSerializer serializer = new StreamingOMSerializer();
        serializer.serialize(getReader(), xmlWriter);
    }

    There is thus no need to build an intermediary object model in this case.

  2. The getReader method also uses an intermediary Axiom tree[7]. In that case it doesn't make sense to use an OMSourcedElement in the first place! At least it doesn't make sense if one assumes that in general the OMSourcedElement will either be serialized or its content accessed after being added to the tree. Indeed, in this case the Axiom tree will be built at least once (if not multiple times), so that the code might as well use a normal OMElement.

    This only leaves the very special case where the OMSourcedElement is in general neither accessed nor serialized, either because it will usually be somehow discarded or because the code uses OMDataSourceExt#getObject() to retrieve the raw data. Even in that case one can argue that in general it should not be too hard to implement at least the serialize method properly by transforming the raw or foreign data directly to StAX events written to the XMLStreamWriter.

    [Note]

    Implementing the serialize method to serialize directly to an XMLStreamWriter instead of using an intermediary Axiom tree of course still leaves the question about the getReader method open. Since we are assuming that implementing getReader properly would be too complex (otherwise one could use the code shown in Example 5.3, “Proper implementation of the OMDataSource#serialize method” to avoid the OM-inside-OMDataSource anti-pattern entirely), one is forced to use the code shown in Example 5.2, “OMDataSource#getReader() implementation used by ADB” (and thus the weaker version of the anti-pattern). However this code depends on the MTOMAwareOMBuilder class which is part of axis2-adb. In some cases, depending on that library may not be an option. Therefore this class should probably be moved to Axiom.

QED



[2] The example shown is actually a simplified version of code that is part of Axis2 1.5.

[4] To do this, Axiom doesn't read the entire data source, but only reads up to the threshold.

[5] An exception is when code tries to access the children of the OMSourcedElement. In this case, the OMSourcedElement will be expanded, i.e. the data will be converted to the native Axiom object model.