Table of Contents
This chapter presents some of the common mistakes and problems people face when writing code using Axiom, as well as anti-patterns that should be avoided.
When working with binary (base64) content, it is sometimes necessary to write a
custom DataSource
implementation to wrap binary data that is
available in a different form (and for which Axiom or the Java Activation Framework
has no out-of-the-box data source implementation). Data sources are also sometimes
(but less frequently) used in conjunction with OMSourcedElement
and OMDataSource
.
The documentation of the DataSource
is very clear on the expected
behavior of the getInputStream
method:
/** * This method returns an InputStream representing * the data and throws the appropriate exception if it can * not do so. Note that a new InputStream object must be * returned each time this method is called, and the stream must be * positioned at the beginning of the data. * * @return an InputStream */ public InputStream getInputStream() throws IOException;
A common mistake is to implement the data source in a way that makes
getInputStream
“destructive”. Consider
the implementation shown in Example 5.1, “DataSource
implementation that violates the interface contract”[2].
It is clear that this data source can only be read once and that any subsequent call to
getInputStream
will return an already closed input stream.
Example 5.1. DataSource
implementation that violates the interface contract
public class InputStreamDataSource implements DataSource { private final InputStream is; public InputStreamDataSource(InputStream is) { this.is = is; } public String getContentType() { return "application/octet-stream"; } public InputStream getInputStream() throws IOException { return is; } public String getName() { return null; } public OutputStream getOutputStream() throws IOException { throw new UnsupportedOperationException(); } }
What makes this mistake so vicious is that very likely it will not cause
problems immediately. The reason is that Axiom is optimized to read the data
only when necessary, which in most cases means only once! However, in some cases
it is unavoidable to read the data several times. When that happens, the broken
DataSource
implementation will cause problems that may
be extremely hard to debug.
Imagine for example[3] that the implementation shown above is used to produce an MTOM message. At first this will work without any problems because the data source is read only once when serializing the message. If later on the MTOM threshold feature is enabled, the broken implementation will (in the worst case) cause the corresponding MIME parts to be empty or (in the best case) trigger an I/O error because Axiom attempts to read from an already closed stream. The reason for this is that when an MTOM threshold is set, Axiom reads the data source twice: once to determine if its size exceeds the threshold[4] and once during serialization of the message.
Quite frequently users post messages on the Axiom related mailing lists about
issues that seem to disappear by “magic” when they try to debug
them. The reason why this can happen is simple. As explained earlier, Axiom uses
deferred building, but at the same time does its best to hide that from the user,
so that he doesn't need to worry about whether the object model has already been
built or not. On the other hand, when serializing the object model to XML or when
requesting a pull parser (XMLStreamReader
) from a node,
the code paths taken may be radically different depending on whether or not
the corresponding part of the tree has already been built. This is especially
true when caching is disabled.
While the end result should be the same in all cases, it is also clear that
in some circumstances an issue that occurs with an incompletely built tree may
disappear if there is something that causes Axiom to build the rest of the object
model. What is important to understand is that the “something” may
be as trivial as a call to the toString
method of an
OMNode
! The fact that adding
System.out.println
statements or logging instructions
is a common debugging technique then explains why issues sometimes seem to
“magically” disappear during debugging.
Finally, it should be noted that inspecting an OMNode
in a debugger also causes a call to the toString
method on that object. This means that by just clicking on something in the
“Variables” window of your debugger, you may completely change the
state of the process that is being debugged!
OMDataSource
objects are used in conjunction with
OMSourcedElement
to build Axiom object model instances
that contain information items that are represented using a framework or API
other than Axiom. Wrapping this “foreign” data in an
OMDataSource
and adding it to the Axiom object model
using an OMSourcedElement
in most cases avoids the
conversion of the data to the “native” Axiom object
model[5].
The OMDataSource
contract requires the implementation
to support two different ways of providing the data, both relying on StAX:
The implementation must be able to provide a pull parser
(XMLStreamReader
) from which the infoset can be
read.
The data source must be able to serialize the infoset to an
XMLStreamWriter
(push).
For the consumer of an event based representation of an XML infoset, it is in
general easier to work in pull mode. That is the reason why StAX has gained
popularity over push based approaches such as SAX. On the other hand for a producer
such as an OMDataSource
implementation, it's exactly the
other way round: it is far easier to serialize an infoset to an
XMLStreamWriter
(push) than to build an
XMLStreamReader
from which a consumer can read (pull) events.
Experience indeed shows that the most challenging part in creating an
OMDataSource
implementation is to write the
getReader
method. To avoid that difficulty some
implementations simply build an Axiom tree and return the
XMLStreamReader
provided by
OMElement#getXMLStreamReader()
. For example, some ADB
(Axis2 Data Binding) versions use the following code[6]:
Example 5.2. OMDataSource#getReader()
implementation used by ADB
public XMLStreamReader getReader() throws XMLStreamException { MTOMAwareOMBuilder mtomAwareOMBuilder = new MTOMAwareOMBuilder(); serialize(mtomAwareOMBuilder); return mtomAwareOMBuilder.getOMElement().getXMLStreamReader(); }
The MTOMAwareOMBuilder
class referenced by this code is a special
implementation of XMLStreamWriter
that builds an Axiom tree from the
sequence of events send to it. The code than uses this Axiom tree to get the
XMLStreamReader
implementation. While this is a functionally correct
implementation of the getReader
method, it is not a good
solution from a performance perspective and also contradicts some of the ideas on
which Axiom is based, namely that the object model should only be built when necessary.
Indeed, it should not be necessary to build an intermediary tree when requesting a pull
parser from the OMDataSource
because all the required information
is already present in the ADB beans. Worse, if the OMSourcedElement
is expanded, the object model instance will be built twice: once by the
getReader
and once by Axiom itself!
While constructing an Axiom tree inside the getReader
method is clearly
an anti-pattern, at least in the case of ADB it is not as bad as it seems at first glance.
The reason is that in the case which is the most relevant for performance
(which is sending a Web Service response prepared using ADB), Axiom will only invoke
the serialize
method and not make use of
getReader
.
At the time of writing there is no general solution available to avoid the
weak version of the OM-inside-OMDataSource anti-pattern in cases where it would be far
too difficult to build a proper |
There is also a stronger version of the anti-pattern which consists in
implementing the serialize
method by building an Axiom tree
and then serializing the tree to the XMLStreamWriter
.
Except for very special cases, there is no valid reason
whatsoever to do this! To see why this is so, consider the two
possible cases:
The OMDataSource
already implements the
getReader
method in a proper way, i.e. without
building an intermediary Axiom tree. To properly implement
serialize
, it is then sufficient
to pull the events from the reader returned by a call to
getReader
and copy them to the
XMLStreamReader
. The easiest and most efficient
way to do this is using StreamingOMSerializer
:
Example 5.3. Proper implementation of the OMDataSource#serialize
method
public void serialize(XMLStreamWriter xmlWriter) throws XMLStreamException { StreamingOMSerializer serializer = new StreamingOMSerializer(); serializer.serialize(getReader(), xmlWriter); }
There is thus no need to build an intermediary object model in this case.
The getReader
method also uses an intermediary
Axiom tree[7].
In that case it doesn't make sense to use an OMSourcedElement
in the first place! At least it doesn't make sense if one assumes that
in general the OMSourcedElement
will either be
serialized or its content accessed after being added to the tree. Indeed,
in this case the Axiom tree will be built at least once (if not multiple times),
so that the code might as well use a normal OMElement
.
This only leaves the very special case where the OMSourcedElement
is in general neither accessed nor serialized, either because it will usually be somehow
discarded or because the code uses OMDataSourceExt#getObject()
to retrieve the raw data. Even in that case one can argue that in general
it should not be too hard to implement at least the serialize
method properly by transforming the raw or foreign data directly to StAX events written to the
XMLStreamWriter
.
Implementing the |
QED
[2] The example shown is actually a simplified version of code that is part of Axis2 1.5.
[3] For another example, see http://markmail.org/thread/omx7umk5fnpb6dnc.
[4] To do this, Axiom doesn't read the entire data source, but only reads up to the threshold.
[5] An exception is when code tries to access the children
of the OMSourcedElement
. In this case, the
OMSourcedElement
will be expanded,
i.e. the data will be converted to the native Axiom object model.