Chapter 4. Integrating Axiom into your project

Table of Contents

Using Axiom in a Maven 2 project
Adding Axiom as a dependency
Managing the JAF and JavaMail dependencies
Applying application wide configuration
Changing the default StAX factory settings
Migrating from older Axiom versions
Changes in Axiom 1.2.9
Changes in Axiom 1.2.11
Changes in Axiom 1.2.13

Using Axiom in a Maven 2 project

Adding Axiom as a dependency

If your project uses Maven 2, it is fairly easy to add Axiom to your project. Simply add the following entries to the dependencies section of pom.xml:

<dependency>
    <groupId>org.apache.ws.commons.axiom</groupId>
    <artifactId>axiom-api</artifactId>
    <version>1.2.13</version>
</dependency>
<dependency>
    <groupId>org.apache.ws.commons.axiom</groupId>
    <artifactId>axiom-impl</artifactId>
    <version>1.2.13</version>
</dependency>

All Axiom releases are deployed to the Maven central repository and there is no need to add an entry to the repositories section. However, if you want to work with the development (snapshot) version of Axiom, it is necessary to add the Apache Snapshot Repository:

<repository>
    <id>apache.snapshots</id>
    <name>Apache Snapshot Repository</name>
    <url>http://repository.apache.org/snapshots/</url>
    <releases>
        <enabled>false</enabled>
    </releases>
</repository>
[Tip]

If you are working on another Apache project, you don't need to add the snapshot repository in the POM file since it is already declared in the org.apache:apache parent POM.

Managing the JAF and JavaMail dependencies

Axiom requires the Java Activation Framework (JAF) and the JavaMail API to work. There are two commonly used incarnations of these libraries: one is Sun's reference implementation, the other is part of the Geronimo project. Axiom declares dependencies on the Geronimo versions (though that might change in the future). If your project uses another library that depends on JAF and/or JavaMail, but that refers to Sun's implementation, your project will end up with dependencies on two different artifacts implementing the same API.

If you prefer Sun's implementations, then you should change the declaration of the Axiom dependencies in your POM file as follow:

<dependency>
    <groupId>org.apache.ws.commons.axiom</groupId>
    <artifactId>axiom-xxx</artifactId>
    <version>1.2.13</version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.geronimo.specs</groupId>
            <artifactId>geronimo-activation_1.1_spec</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.geronimo.specs</groupId>
            <artifactId>geronimo-javamail_1.4_spec</artifactId>
        </exclusion>
    </exclusions>
</dependency>

If you prefer Geronimo's implementation, then you need to identify the libraries depending on Sun's artifacts (javax.activation:activation and javax.mail:mail) and add the relevant exclusions. You can use mvn dependency:tree to easily identify where a transitive dependency comes from.

The choice between Sun's and Geronimo's implementation is to a large extend a question of belief. Note however that the geronimo-javamail_1.4_spec artifact used by Axiom only contains the JavaMail API, while Sun's library bundles the API together with the providers for IMAP and POP3. Depending on your use case that might be an advantage or disadvantage.

Applying application wide configuration

Sometimes it is necessary to customize some particular aspects of Axiom for an entire application. There are several things that can be configured through system properties and/or property files. This is also important when using third party applications or libraries that depend on Axiom.

Changing the default StAX factory settings

[Note]

The information in this section only applies to XMLStreamReader or XMLStreamWriter instances created using StAXUtils (see the section called “Creating stream readers and writers using StAXUtils). Readers and writers created using the standard StAX APIs will keep their default settings as defined by the implementation (or dictated by the StAX specifications).

[Note]

The feature described in this section was introduced in Axiom 1.2.9.

When creating a new XMLInputFactory (resp. XMLInputFactory), StAXUtils looks for a property file named XMLInputFactory.properties (resp. XMLOutputFactory.properties) in the classpath, using the same class loader as the one from which the factory is loaded (by default this is the context classloader). If a corresponding resource is found, the properties in that file are applied to the factory using the XMLInputFactory#setProperty (resp. XMLOutputFactory#setProperty) method.

This feature can be used to set factory properties of type Boolean, Integer and String. The following sections present some sample use cases.

Changing the serialization of the CR-LF character sequence

Section 2.11 of [XML] specifies that an XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character. This implies that when a Windows style line ending, i.e. a CR-LF character sequence is serialized literally into an XML document, the CR character will be lost during deserialization. Depending on the use case this may or may not be desirable.

The only way to strictly preserve CR characters is to serialize them as character entities, i.e. &#xD;. This is the default behavior of Woodstox. This can be easily checked using the following Java snippet:

OMFactory factory = OMAbstractFactory.getOMFactory();
OMElement element = factory.createOMElement("root", null);
element.setText("Test\r\nwith CRLF");
element.serialize(System.out);

This code produces the following output:

<root>Test&#xd;
with CRLF</root>
[Note]

From Axiom's point of view this is actually a reasonable behavior. The reason is that when creating an OMText node programmatically, it is easy for the user code to normalize the text content to avoid the appearance of the character entity. On the other hand, if the default behavior was to serialize CR-LF literally (implying that the CR character will be lost during deserialization), it would be difficult (if not impossible) for user code that needs to strictly preserve the text data to construct the object model in such a way as to force serialization of the CR as character entity.

In some cases this behavior may be undesirable[1]. Fortunately Woodstox allows to modify this behavior by changing the value of the com.ctc.wstx.outputEscapeCr property on the XMLOutputFactory. If Axiom is used (and in particular StAXUtils) than this can be achieved by adding a XMLOutputFactory.properties file with the following content to the classpath (in the default package):

com.ctc.wstx.outputEscapeCr=false

Now the output of the Java snippet shown above will be:

<root>Test
with CRLF</root>

Preserving CDATA sections during parsing

By default, StAXUtils creates StAX parsers in coaelescing mode. In this mode, the parser will never return two character data events in sequence, while in non coaelescing mode, the parser is allowed to break up character data into smaller chunks and to return multiple consecutive character events, which may improve throughput for documents containing large text nodes. It should be noted that StAXUtils overrides the default settings mandated by the StAX specification, which specifies that by default, a StAX parser must be in non coalescing mode. The primary reason is compatibility: older versions of Woodstox had coalescing switched on by default.

A side effect of the default settings chosen by Axiom is that by default, CDATA sections are not reported by parser created by StAXUtils. The reason is that in coalescing mode, the parser will not only coaelsce adjacent text nodes, but also CDATA sections. Applications that require correct reporting of CDATA sections should therefore disable coalescing. This can be achieved by creating a XMLInputFactory.properties file with the following content:

javax.xml.stream.isCoalescing=false

Migrating from older Axiom versions

This section provides information about changes in Axiom that might impact existing code when migrating from an older version of Axiom. Note that this section is not meant as a change log that lists all changes or new features. Also, before upgrading to a newer Axiom version, you should always check if your code uses methods or classes that have been deprecated. You should fix all deprecation warnings before changing the Axiom version. In general the Javadoc of the deprecated class or method gives you a hint on how to change your code.

Changes in Axiom 1.2.9

System properties used by OMAbstractFactory

Prior to Axiom 1.2.9, OMAbstractFactory used system properties as defined in the following table to determine the factory implementations to use:

Object modelMethodSystem propertyDefault
Plain XML getOMFactory() om.factory org.apache.axiom.om.impl.llom.factory.OMLinkedListImplFactory
SOAP 1.1 getSOAP11Factory() soap11.factory org.apache.axiom.soap.impl.llom.soap11.SOAP11Factory
SOAP 1.2 getSOAP12Factory() soap12.factory org.apache.axiom.soap.impl.llom.soap12.SOAP12Factory

This in principle allowed to mix default factory implementations from different implementations of the Axiom API (e.g. an OMFactory from the LLOM implementation and SOAP factories from DOOM). This however doesn't make sense. The system properties as described above are no longer supported in 1.2.9 and the default Axiom implementation is chosen using the new org.apache.axiom.om.OMMetaFactory system property. For LLOM, you should set:

org.apache.axiom.om.OMMetaFactory=org.apache.axiom.om.impl.llom.factory.OMLinkedListMetaFactory

This is the default and is equivalent to the defaults in 1.2.8. For DOOM, you should set:

org.apache.axiom.om.OMMetaFactory=org.apache.axiom.om.impl.dom.factory.OMDOMMetaFactory

Factories returned by StAXUtils

In versions prior to 1.2.9, the XMLInputFactory and XMLOutputFactory instances returned by StAXUtils were mutable, i.e. it was possible to change the properties of these factories. This is obviously an issue since the factory instances are cached and can be shared among several thread. To avoid programming errors, starting from 1.2.9, the factories are immutable and any attempt to change their state will result in an IllegalStateException.

Note that the possibility to change the properties of these factories could be used to apply application wide settings. Starting with 1.2.9, Axiom has a proper mechanism to allow this. This feature is described in the section called “Changing the default StAX factory settings”.

Changes in XOP/MTOM handling

In Axiom 1.2.8, XMLStreamReader instances provided by Axiom could belong to one of three different categories:

  1. XMLStreamReader instances delivering plain XML.

  2. XMLStreamReader instances delivering plain XML and implementing a custom extension to retrieve optimized binary data.

  3. XMLStreamReader instances representing XOP encoded data.

As explained in AXIOM-255 and AXIOM-122, in Axiom 1.2.8, the type of stream reader provided by the API was not always well defined. Sometimes the type of the stream reader even depended on the state of the Axiom tree (i.e. whether some part of it has been accessed or not).

In release 1.2.9 the behavior of Axiom was changed such that it never delivers XOP encoded data unless explicitly requested to do so. By default, any XMLStreamReader provided by Axiom now represents plain XML data and optionally implements the DataHandlerReader extension to retrieve optimized binary data. An XOP encoded stream can be requested from the getXOPEncodedStream method in XOPUtils.

Changes in Axiom 1.2.11

Resurrection of the OMXMLBuilderFactory API

Historically, org.apache.axiom.om.impl.llom.factory.OMXMLBuilderFactory was used to create Axiom trees from XML documents. Unfortunately, this class is located in the wrong package and JAR (it is implementation independent but belongs to LLOM). In Axiom 1.2.10, the standard way to create an Axiom tree was therefore to instantiate StAXOMBuilder or one of its subclasses directly. However, this is not optimal for two reasons:

  • It relies on the assumption that every implementation of the Axiom API necessarily uses StAXOMBuilder. This means that an implementation doesn't have the freedom to provide its own builder implementation (e.g. in order to implement some special optimizations).

  • StAXOMBuilder and its subclasses belong to packages which have impl in their names. This tends to blur the distinction between the public API and internal implementation classes.

Therefore, in Axiom 1.2.11, a new abstract API for creating builder instances was introduced. It is again called OMXMLBuilderFactory, but located in the org.apache.axiom.om package. The methods defined by this new API are similar to the ones in the original (now deprecated) OMXMLBuilderFactory, so that migration should be easy.

Changes in the behavior of certain iterators

In Axiom 1.2.10 and previous versions, iterators returned by methods such as OMIterator#getChildren() internally stayed one step ahead of the node returned by the next() method. This meant that sometimes, using such an iterator had the side effect of building elements that were not intended to be built. In Axiom 1.2.11 this behavior was changed such that next() no longer builds the nodes it returns. In a few cases, this change may cause issues in existing code. One known instance is the following construct (which was used internally by Axiom itself):

while (children.hasNext()) { 
    OMNodeEx omNode = (OMNodeEx) children.next(); 
    omNode.internalSerializeAndConsume(writer); 
}

One would expect that the effect of this code is to consume the child nodes. However, in Axiom 1.2.10 this is not the case because next() actually builds the node. Note that the code actually doesn't make sense because once a child node has been consumed, it is no longer possible to retrieve the next sibling. Since in Axiom 1.2.11 the call to next() no longer builds the child node, this code will indeed trigger an exception.

Another example is the following piece of code which removes all child elements with a given name:

Iterator iterator = element.getChildrenWithName(qname);
while (iterator.hasNext()) {
    OMElement child = (OMElement)iterator.next();
    child.detach();
}

In Axiom 1.2.10 this works as expected. Indeed, since the iterator stays one node ahead, the current node can be safely removed using the detach() method. In Axiom 1.2.11, this is no longer the case and the following code (which also works with previous versions) should be used instead:

Iterator iterator = element.getChildrenWithName(qname);
while (iterator.hasNext()) {
    iterator.next();
    iterator.remove();
}

Note that this is actually compatible with the behavior of the Java 2 collection framework, where a ConcurrentModificationException may be thrown if a thread modifies a collection directly while it is iterating over the collection with an iterator.

In Axiom 1.2.12, the iterator implementations have been further improved to detect this situation and to throw a ConcurrentModificationException. This enables early detection of problematic usages of iterators.

Changes in Axiom 1.2.13

Handling of illegal namespace declarations

Both XML 1.0 and XML 1.1 forbid binding a namespace prefix to the empty namespace name. Only the default namespace can have an empty namespace name. In XML 1.0, prefixed namespace bindings may not be empty, as explained in section 5 of [XMLNS]:

In a namespace declaration for a prefix (i.e., where the NSAttName is a PrefixedAttName), the attribute value MUST NOT be empty.

In Axiom 1.2.12, the declareNamespace methods in OMElement didn't enforce this constraint and namespace declarations violating this requirement were silently dropped during serialization. This behavior is problematic because it may result in subtle issues such as unbound namespace prefixes. In Axiom 1.2.13 these methods have been changed so that they throw an exception if an attempt is made to bind the empty namespace name to a prefix.

In XML 1.1, prefixed namespace bindings may be empty, but rather than binding the empty namespace name to a prefix, such a namespace declaration "undeclares" the prefix, as explained in section 5 of [XMLNS11]:

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e. an element in whose content the prefixed markup occurs). Furthermore, the attribute value in the innermost such declaration must not be an empty string.

Although the same syntax is used in both cases, adding a namespace declaration to bind a prefix to a (non empty) namespace URI and adding a namespace declaration to undeclare a prefix are two fundamentally different operations from the point of view of the application. Therefore, to support prefix undeclaring for XML 1.1 infosets, a new method undeclarePrefix has been added to OMElement in Axiom 1.2.13.

As a corollary of the above, neither XML 1.0 nor XML 1.1 allows creating prefixed elements or attributes with an empty namespace name. In Axiom 1.2.12, when attempting to create such invalid information items, the behavior was inconsistent: in some cases, the prefix was silently dropped, in other cases the invalid information item was actually created, resulting in problems during serialization. Axiom 1.2.13 consistently throws an exception when an attempt is made to create such an invalid information item.

OMNamespace normalization

Methods that return an OMNamespace object may in principle use two different ways to represent the absence of a namespace: as a null value or as an OMNamespace instance that has both prefix and namespaceURI properties set to the empty string. This applies in particular to OMElement#getNamespace(), OMElement#getDefaultNamespace() and OMAttriute#getNamespace(). The API of Axiom 1.2.12 didn't clearly specify which representation was used, although in most cases a null value was used. As a consequence application code had to take into account the possibility that such methods returned OMNamespace instances with an empty prefix and namespace URI.

In Axiom 1.2.13 the situation has been clarified and the aforementioned APIs now always return null to indicate the absence of a namespace. Note that this may have an impact on flawed application code that doesn't handle null in the same way as an OMNamespace instance with an empty prefix and namespace URI. Such application code needs to be fixed to work correctly with Axiom 1.2.13.

New abstract APIs

Axiom 1.2.13 introduces a couple of new abstract APIs which give implementations of the Axiom API the freedom to do additional optimizations. Application code should be migrated to take advantage of these new APIs:

  • Instead of instantiating a OMSource object directly, OMContainer#getSAXSource(boolean) should be used.

  • org.apache.axiom.om.impl.dom.DOOMAbstractFactory has been deprecated because it ties application code that requires an object model factory supporting DOM to a particular Axiom implementation (DOOM). Instead use OMAbstractFactory.getMetaFactory(String) with OMAbstractFactory.FEATURE_DOM as parameter value to get a meta factory for an Axiom implementation that supports DOM.

  • The DocumentBuilderFactory implementation provided by DOOM should no longer be instantiated directly. Instead, application code should request a meta factory for DOM (see previous item), cast it to DOMMetaFactory and invoke newDocumentBuilderFactory via that interface.

[Tip]

The last two changes imply that axiom-dom should no longer be used as a compile time dependency, but only as a runtime dependency.

Note that some of the superseded APIs may disappear in Axiom 1.3.

Usage of Apache James Mime4J as MIME parser

Starting with version 1.2.13, Axiom uses Apache James Mime4J as MIME parser implementation instead of its own custom parser implementation. The public API as defined by the Attachments class remains unchanged, with the following exceptions:

  • The getIncomingAttachmentsAsSingleStream method is no longer supported.

  • The fileThreshold specified during the construction of the Attachments object is now interpreted relative to the size of the decoded content of the attachment instead of the size of the encoded content. Note that this only makes a difference if the attachment has a content transfer encoding other than binary.

Several internal classes related to the old MIME parsing code have been removed, are no longer public or have been changed in an incompatible way:

  • MIMEBodyPartInputStream

  • BoundaryDelimitedStream

  • BoundaryPushbackInputStream

  • MultipartAttachmentStreams

  • PartFactory and related classes

Although these classes were public, they are not considered part of the public API. Application code that depends on these classes needs to be rewritten before upgrading to Axiom 1.2.13.

When upgrading to 1.2.13, projects that use Axiom's XOP/MTOM features must make sure that Apache James Mime4J is added to the dependencies. For projects that use Maven (or tools that support Maven repositories and metadata) this happens automatically. Projects that use other build tools must explicity add the apache-mime4j-core library to the list of dependencies.

Support for MIME part streaming

Axiom 1.2.13 has support for MIME part streaming. Pre-existing APIs continue to work as documented, but there are some minor changes in behavior that may be visible to code that makes assumptions that are not covered by the API contract:

  • The DataHandler instances returned by Attachments for MIME parts read from a stream now always implement DataHandlerExt, while in 1.2.12 this was only the case for parts buffered using temporary files. For memory buffered MIME parts, a call to purgeDataSource has the effect of releasing the allocated memory.



[1] See WSTX-94 for a discussion about this.