Chapter 5. The StAX specification

Table of Contents

Semantics of the setPrefix method
The three XMLStreamWriter usage patterns

The StAX specification comprises two parts: a specification document titled Streaming API For XML JSR-173 Specification and a Javadoc describing the API. Both can be downloaded from the JSR-173 page. Since StAX is part of Java 6, the Javadocs can also be viewed online.

Semantics of the setPrefix method

Probably one of the more obscure parts of the StAX specifications is the meaning of the setPrefix[3] method defined by XMLStreamWriter. To understand how this method works, it is necessary to look at different parts of the specification:

  • The Javadoc of the setPrefix method.

  • The table shown in the Javadoc of the XMLStreamWriter class in Java 6[4].

  • Section 5.2.2, Binding Prefixes of the specification.

  • The example shown in section 5.3.2, XMLStreamWriter of the specification.

In addition, it is important to note the following facts:

  • The terms defaulting prefixes used in section 5.2.2 of the specification and namespace repairing used in the Javadocs of XMLStreamWriter are synonyms.

  • The methods writing namespace qualified information items, i.e. writeStartElement, writeEmptyElement and writeAttribute all come in two variants: one that takes a namespace URI and a prefix as arguments and one that only takes a namespace URI, but no prefix.

The purpose of the setPrefix method is simply to define the prefixes that will be used by the variants of the writeStartElement, writeEmptyElement and writeAttribute methods that only take a namespace URI (and the local name). This becomes clear by looking at the table in the XMLStreamWriter Javadoc. Note that a call to setPrefix doesn't cause any output and it is still necessary to use writeNamespace to actually write the necessary namespace declarations. Otherwise the produced document will not be well formed with respect to namespaces.

The Javadoc of the setPrefix method also clearly defines the scope of the prefix bindings defined using that method: a prefix bound using setPrefix remains valid till the invocation of writeEndElement corresponding to the last invocation of writeStartElement. While not explicitly mentioned in the specifications, it is clear that a prefix binding may be masked by another binding for the same prefix defined in a nested element.

An aspect that may cause confusion is the fact that in the example shown in section 5.3.2 of the specifications, the calls to setPrefix (and setDefaultNamespace) all appear immediately before a call to writeStartElement or writeEmptyElement. This may lead people to incorrectly believe that a prefix binding defined using setPrefix only applies to the next element written[5]. This interpretation is clearly in contradiction with the setPrefix Javadoc, unless one assumes that the current START_ELEMENT / END_ELEMENT pair means the element opened by a call to writeStartElement immediately following the call to setPrefix. This however would be a very arbitrary interpretation of the Javadoc[6].

The correctness of the comments in the previous paragraph can be checked using the following code snippet:

XMLOutputFactory f = XMLOutputFactory.newInstance();
XMLStreamWriter writer = f.createXMLStreamWriter(System.out);
writer.writeStartElement("root");
writer.setPrefix("p", "urn:ns1");
writer.writeEmptyElement("urn:ns1", "element1");
writer.writeEmptyElement("urn:ns1", "element2");
writer.writeEndElement();
writer.flush();
writer.close();

This produces the following output[7]:

<root><p:element1/><p:element2/></root>

Since the code doesn't call writeNamespace, the output is obviously not well formed with respect to namespaces, but it also clearly shows that the scope of the prefix binding for p extends to the end of the root element and is not limited to element1.

To avoid unexpected results and keep the code maintainable, it is in general advisable to keep the calls to setPrefix and writeNamespace aligned, i.e. to make sure that the scope (in XMLStreamWriter) of the prefix binding defined by setPrefix is compatible with the scope (in the produced document) of the namespace declaration written by the corresponding call to writeNamespace. This makes it necessary to write code like this:

writer.writeStartElement("p", "element1", "urn:ns1");
writer.setPrefix("p", "urn:ns1");
writer.writeNamespace("p", "urn:ns1");

As can be seen from this code snippet, keeping the two scopes in sync makes it necessary to use the writeStartElement variant which takes an explicit prefix. Note that this somewhat conflicts with the purpose of the setPrefix method; one may consider this as a flaw in the design of the StAX API.

The three XMLStreamWriter usage patterns

Drawing the conclusions from the previous section and taking into account that XMLStreamWriter also has a namespace repairing mode, one can see that there are in fact three different ways to use XMLStreamWriter. These usage patterns correspond to the three bullets in section 5.2.2 of the StAX specification[8]:

  1. In the namespace repairing mode (enabled by the javax.xml.stream.isRepairingNamespaces property), the writer takes care of all namespace bindings and declarations, with minimal help from the calling code. This will always produce output that is well-formed with respect to namespaces. On the other hand, this adds some overhead and the result may depend on the particular StAX implementation (though the result produced by different implementations will be equivalent).

    In repairing mode the calling code should avoid writing namespaces explicitly and leave that job to the writer. There is also no need to call setPrefix, except to suggest a preferred prefix for a namespace URI. All variants of writeStartElement, writeEmptyElement and writeAttribute may be used in this mode, but the implementation can choose whatever prefix mapping it wants, as long as the output results in proper URI mapping for elements and attributes.

  2. Only use the variants of the writer methods that take an explicit prefix together with the namespace URI. In this usage pattern, setPrefix is not used at all and it is the responsibility of the calling code to keep track of prefix bindings.

    Note that this approach is difficult to implement when different parts of the output document will be produced by different components (or even different libraries). Indeed, when passing the XMLStreamWriter from one method or component to the other, it will also be necessary to pass additional information about the prefix mappings in scope at that moment, unless the it is acceptable to let the called method write (potentially redundant) namespace declarations for all namespaces it uses.

  3. Use setPrefix to keep track of prefix bindings and make sure that the bindings are in sync with the namespace declarations that have been written, i.e. always use setPrefix immediately before or immediately after each call to writeNamespace. Note that the code is still free to use all variants of writeStartElement, writeEmptyElement and writeAttribute; it only needs to make sure that the usage it makes of these methods is consistent with the prefix bindings in scope.

    The advantage of this approach is that it allows to write modular code: when a method receives an XMLStreamWriter object (to write part of the document), it can use the namespace context of that writer (i.e. getPrefix and getNamespaceContext) to determine which namespace declarations are currently in scope in the output document and to avoid redundant or conflicting namespace declarations. Note that in order to do so, such code will have to check for an existing prefix binding before starting to use a namespace.



[3] For simplicity, we only discuss setPrefix here. The same remarks also apply to setDefaultNamespace.

[4] This table is not included in the Javadoc in the original StAX specification.

[5] Another factor that contributes to the confusion is that in SAX, prefix mappings are always generated before the corresponding startElement event and that their scope ends with the corresponding endElement event. This is so because the ContentHandler interface specifies that all startPrefixMapping events will occur immediately before the corresponding startElement event, and all endPrefixMapping events will occur immediately after the corresponding endElement event.

[6] Early versions of XL XP-J were based on this interpretation of the specifications, but this has been corrected. Versions conforming to the specifications support a special property called javax.xml.stream.XMLStreamWriter.isSetPrefixBeforeStartElement, which always returns Boolean.FALSE. This allows to easily distinguish the non conforming versions from the newer versions. Note that in contrast to what the usage of the reserved javax.xml.stream prefix suggests, this is a vendor specific property that is not supported by other implementations.

[7] This has been tested with Woodstox 3.2.9, SJSXP 1.0.1 and version 1.2.0 of the reference implementation.

[8] The content of this section is largely based on a reply posted by Tatu Saloranta on the Axiom mailing list. Tatu is the main developer of the Woodstox project.