OM stands for Object Model (also known as AXIOM - AXis Object Model) and refers to the XML infoset model that is developed for Axis 2. XML infoset refers to the information included inside the XML and for programmatical manipulation it is convenient to have a representation of this XML infoset in a language specific manner. For an object oriented language the obvious choice is a model made up of objects. DOM and JDOM are two such XML models. OM is conceptually similar to such an XML model by its external behavior but deep down it is very much different. The objective of this tutorial is to introduce the basics of OM and explain the best practices to follow while using OM. However before entering the deep ends of OM it is better to skim the surface and see what it is all about!
This tutorial can be used by anybody who is interested in OM and needs to go deeper in it. However it is assumed that the reader has a basic understanding of the concepts of XML (such as Namespaces) and a working knowledge of tools such as Ant. Knowledge in similar object models such as DOM will be quite helpful in understanding but such knowledge is not assumed. Several Links are listed in the appendix/ links section that will help anybody who lacks the basic understanding of XML.
Pull parsing is a recent trend in XML processing. The previously popular XML processing frameworks such as SAX and DOM were "push-based" which means the control of the parsing was with the parser itself. This approach is fine and easy to use but it was not efficient in handling large XML documents since a complete memory model will be generated in the memory. Pull parsing inverts the control and hence the parser only proceeds at the users command. The user can decide to store or discard events generated from the parser. OM is based on pull parsing. To learn more about XML pull parsing see the XML pull parsing introduction.
The original OM was proposed as a store for the pull parser events for later processing, at the Axis summit held at Colombo in September 2004. However this approach was soon improved and OM was pursued as a complete info set model due to its flexibility. Several implementation techniques were attempted during the initial phases. The two most promising techniques were the table based technique and the link list based technique. During the intermediate performance tests the link list based technique proved to be much more memory efficient for smaller and mid sized XML documents (the advantage of the table based OM was only visible for the large and very large XML documents) and hence the link list based technique was chosen as the most suitable. Initial efforts were focused on implementing the XML info set items which are relevant to the SOAP specification (DTD support, Processing Instruction support, etc were not considered). The advantage of having a tight integration was evident at this stage and this resulted in having SOAP specific interfaces as part of OM rather than a layer on top of it. OM was deliberately made API centric. It allows the implementations to take place independently and swapped without affecting the program later.
OM is a lightweight, differed built XML info set representation based on StAX (JSR 173), which is the standard streaming pull parser API. The object model can be manipulated as flexibly as any other object model (Such as JDOM), but underneath the objects will be created only when they are absolutely required. This leads to much less memory intensive programming. Following is a short feature overview of OM.
OM is tightly bound to StAX API. To work with OM a StAX compliant parser and the API must be present in the classpath. |
The Following image shows how OM API is viewed by the user
Figure 1
OM Builder wraps the raw xml character stream through the StAX reader API. Hence the complexities of the pull event stream is covered
Since OM is a differed built Object model, It incorporates the concept of caching. Caching refers to the creation of the objects while parsing the pull stream. The reason why this is so important is because caching can be turned off in certain situations. if so the parser proceeds without building the object structure. User can extract the raw pull stream from OM and use that instead of the OM and in this case it is sometimes beneficial to switch off caching. The advanced operations section explains more on accessing the raw pull stream and switching the caching on and off.
In a nutshell SOAP is a information exchange protocol based on XML. SOAP has a defined set of XML elements that should be used in messages. Since Axis is a "SOAP Engine" and OM is built for Axis, A set of SOAP specific objects were also defined along with OM. These SOAP Objects are extensions of the general OM objects. To learn more on SOAP
OM is not a separate product but part of Axis2. However since Axis2 has a modular build structure It is possible to obtain an "OM only" jar.
The easiest way to obtain the OM binary is to download the Axis2 binary distribution. The lib directory will contain the axis2-xml-0.93.jar. However more adventurous users can build the OM from source. The next section describes how to build OM from source.
Detailed information on getting source from Axis2 SVN repository can be found here.
After the source download OM-binary can be built. For both Windows and Linux move to the project directory and execute the command "maven jar". All other necessary jars will be automatically downloaded. When the build finishes successfully, the axis2-xml-0.93.jar can be found in the newly created "targets" directory in the XML module.
Once the OM-binary is obtained by any of the mentioned means , it should be included in the class path for any of the OM based programs to work. The subsequent parts of this tutorial assume that this build step is complete and the Axis-0.93.jar is correctly in the classpath along with the StAX API jar file and a StAX implementation.
Creation is the first and foremost action in using an Object representation. This part explains how OM can be built from an existing document or just programmatically. OM provides a notion of a factory and a builder to create objects. The factory helps to keep the code at the interface level and the implementations separately (Figure 2). Since OM is tightly bound to StAX, a StAX compliant reader should be created first with the desired input stream. Then the reader should be fed into the OMXMLBuilderFactory to instantiate a suitable builder. The interface provided by the builder is identical though the internal implementations vary. However, the types of the returned objects depend on the implementation of the builder. For example the SOAPModelBuilder returns SOAP specific objects (such as the SOAPEnvelope, which are sub classes of the OMElement) through its builder methods. The following piece of code shows the correct method of creating an OM document from an input stream. Note that the SOAP builder is used in this example.
//create the parser
XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(new FileReader(file));
//create the builder
OMXMLParserWrapper builder = OMXMLBuilderFactory.createStAXSOAPModelBuilder(OMAbstractFactory.getSOAP11Factory(), parser); //get the root element (in this case the envelope)
SOAPEnvelope envelope = (SOAPEnvelope) builder.getDocumentElement();
As the example shows, creating an OM from an input stream is pretty straightforward. However elements and nodes can be created programmatically to modify the structure as well. The recommended way to create OM objects programmatically is to use the factory. OMAbstractFactory.getOMFactory() will return the proper factory and the creator methods for each type should be called. Currently OM has two builders, namely the OM builder and the SOAP model builder. These builders provide the necessary information to the XML info set model to build itself.
Figure 2
A simple example is shown below.
//create a factory OMFactory factory = OMAbstractFactory.getOMFactory(); //use the factory to create two namespace objects OMNamespace ns1 = factory.createOMNamespace("bar","x"); OMNamespace ns2 = factory.createOMNamespace("bar1","y"); //use the factory to create three elements OMElement root = factory.createOMElement("root",ns1); OMElement elt11 = factory.createOMElement("foo1",ns1); OMElement elt12 = factory.createOMElement("foo2",ns1);
The reason to have a set of factory.createXXX methods is to cater for different implementations but keep the programmers code intact. Its highly recommend to use the factory for creating OM objects as this will ease the switching of different OM implementations. Several differences exist between a programmatically created OMNode and a conventionally built OMNode. The most important difference is that the former will have no builder object enclosed where as the latter always carries a reference to its builder. As stated earlier in this tutorial, since the object model is built as and when required, each and every OMNode should have a reference to its builder. If this information is not available, it is due to the Object being created without a builder. This difference becomes evident when the user tries to get a non caching pull parser from the OMElement. This will be discussed in more detail in the advanced operations section.
In order to understand the requirement of the builder reference in each and every OMNode, consider the following scenario. Assume that the parent element is built but the children elements are not. If the parent is asked to iterate through its children, this information is not readily available to the parent element and it should build its children first before attempting to iterate them. In order to provide a reference of the builder, each and every node of an OM structure should carry the reference to its builder. Each and every OMNode carries a flag that states its build status. Apart from this restriction there are no other constraints that keep the programmer away from mixing up programmatically made OMNode objects with OMNode objects built from builders.
The SOAP Object hierarchy is made in the most natural way for a programmer. An inspection of the API will show that it is quite close to the SAAJ API but with no bindings to DOM or any other model. The SOAP classes extend basic OM classes (such as the element) hence one can access a SOAP document either with the abstraction of SOAP or drill down to the underlying XML Object model with a simple casting.
Addition and removal methods are primarily defined in the OMElement interface. The following are the most important in adding nodes.
public void addChild(OMNode omNode); public void addAttribute(OMAttribute attr);
This code segment shows how the addition takes place. Note that it is related to the code segment shown in the creation section.
//set the children elt11.addChild(elt21); elt12.addChild(elt22); root.addChild(elt11); root.addChild(elt12);
Note that AddChild method will always add the child as the first child of the parent. Removal of Nodes A given node can be removed from the tree by calling the detach() method. A node can also be removed from the tree by calling the remove method of the returned iterator which will also call the detach method of the particular node internally. Handling namespaces Namespaces are a tricky part of any XML object model and is the same in OM. However care has been taken to make the interface to the namespace very simple. OMNamespace is the class that represents a namespace with intentionally removed setter methods. This makes the OMNamespace immutable and allows the underlying implementation to share the objects without any difficulty. Following are the important methods available in OMElement to handle namespaces.
public OMNamespace declareNamespace(String uri, String prefix); public OMNamespace declareNamespace(OMNamespace namespace); public OMNamespace findNamespace(String uri, String prefix) throws OMException;
The declareNamespaceXX methods are fairly straightforward. They add a namespace to namespace declarations section. Note that a namespace declaration that has already being added will not be added twice. FindNamespace is a very handy method to locate a namespace object higher up the object tree. It searches for a matching namespace in its own declarations section and jumps to the parent if it's not found. The search progresses up the tree until a matching namespace is found or the root has been reached.
During the serialization a directly created namespace from the factory will only be added to the declarations when that prefix is encountered by the serializer. More of the serialization matters will be discussed in the serializer section.
The following simple code segment shows how the namespaces are dealt with in OM
OMFactory factory = OMAbstractFactory.getOMFactory(); OMNamespace ns1 = factory.createOMNamespace("bar","x"); OMElement root = factory.createOMElement("root",ns1); OMNamespace ns2 = root.declareNamespace("bar1","y"); OMElement elt1 = factory.createOMElement("foo",ns1); OMElement elt2 = factory.createOMElement("yuck",ns2); OMText txt1 = factory.createText(elt2,"blah"); elt2.addChild(txt1); elt1.addChild(elt2); root.addChild(elt1);
Serilization of the root element produces the following XML
<x:root xmlns:x="bar" xmlns:y="bar1"> <x:foo> <y:yuck>blah</y:yuck> </x:foo> </x:root>
Traversing the object structure can be done in the usual way by using the list of children. Note however that the child nodes are returned as an iterator. The Iterator supports the 'OM way' of accessing elements and is more convenient than a list for sequential access. The following code sample shows how the children can be accessed. The children are of the type OMNode that can either be OMText or OMElement.
Iterator children = root.getChildren(); While(children.hasNext()){ OMNode node = (OMNode)children.next(); }
Apart from this every OMNode has links to its siblings. If more thorough navigation is needed the nextSibling() and PreviousSibling() methods can be used. A more selective set can be chosen by using the getChildrenWithName(QName) methods. The getChildWithName(Qname) method returns the first child that matches the given QName and getChildrenWithName(QName) returns a collection containing all the matching children. The advantage of these iterators is that they won't build the whole object structure at once, until its required.
All iterator implementations internally stay one step ahead of their apparent location to provide the correct value for the hasNext() method. This hidden advancement can build elements that are not intended to be built at all. Hence these iterators are recommended only when caching is not a concern. |
OM can be serialized either as the pure object model or the pull event stream. The serialization uses a XMLStreamWriter object to write out the output and hence the same serialization mechanism can be used to write different types of outputs (such as text, binary, etc.,).
A caching flag is provided by OM to control the building of the in-memory OM. The OMNode has two methods, serializeWithCache and serialize When serialize is called the cache flag is reset and the serializer does not cache the stream. Hence the object model will not be built if the cache flag is not set.
The serializer serializes namespaces in the following way.
Because of this behavior, if a fragment of the XML is serialized, it will also be namespace qualified with the necessary namespace declarations.
Here is an example that shows how to write the output to the console, with reference to the earlier code sample (Code listing 2.1 ) that created a SOAP envelope.
XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(System.out); //dump the output to console with caching envelope.serializeWithCache(writer); writer.flush();
The above mentioned features of the serializer forces a correct serialization even if only a part of the OM tree is serialized. The following serializations show how the serialization mechanism takes the trouble to accurately figure out the namespaces. The example is from code listing 2.6 which creates a small OM programmatically. Serialization of the root element produces
<x:root xmlns:x="bar" xmlns:y="bar1"> <x:foo> <y:yuck>blah</y:yuck> </x:foo> </x:root>
However serialization of only the foo element produces
<x:foo xmlns:x="bar"> <y:yuck xmlns:y="bar1">blah</y:yuck> </x:foo>
Note how the serializer puts the relevant namespace declarations in place. Complete code for the OM based document building and serialization The following code segment shows how to use the OM for completely building a document and then serializing it into text pushing the output to the console. Only the important sections are shown here and the complete program listing can be found in the appendix.
//create the parser XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(new FileReader(file)); //create the builder OMXMLParserWrapper builder = OMXMLBuilderFactory.createStAXSOAPModelBuilder(OMAbstractFactory.getOMFactory(),parser); //get the root element (in this case the envelope) SOAPEnvelope envelope = (SOAPEnvelope)builder.getDocumentElement(); //get the writer XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(System.out); //dump the out put to console with caching envelope.serialize(writer); writer.flush();
OM provides a utility class to navigate the OM structure. The navigator provides an in-order traversal of the OM tree up to the last-built node. The Navigator has two states called the navigable state and the completion state. Since the navigator provides the navigation starting from an OMElement, it is deemed to have completed the navigation when the starting node is reached again. This state is known as the completion state. Once the navigator has reached the complete status its navigation is done and it cannot proceed anymore.
It is possible that the OM tree does not get built completely when it is navigated. The navigable status shows whether the tree structure is navigable. When the navigator is complete it is not navigable anymore. However it is possible for a navigator to become non-navigable without being complete. The following code sample shows how the navigator should be used and handled using its states.
//Create a navigator OMNavigator navigator = new OMNavigator(envelope); OMNode node = null; while (navigator.isNavigable()) { node = navigator.next(); }
OM is tightly integrated with StAX and the getXMLStreamReader()/getXMLStreamReaderWithoutCaching() methods in the OMElement provides a XMLStreamReader object. This XMLStreamReader instance has a special capability of switching between the underlying stream and the OM object tree if the cache setting is off. However this functionality is completely transparent to the user. This is further explained in the following paragraphs.
OM has the concept of caching, and OM is the actual cache of the events fired. However the requester can choose to get the pull events from the underlying stream rather than the OM tree. This can be achieved by getting the pull parser with the cache off. If the pull parser was obtained without switching off cache, the new events fired will be cached and the tree updated. This returned pull parser will switch between the object structure and the stream underneath and the users need not worry about the differences caused by the switching. The exact pull stream the original document would have provided would be produced even if the OM tree was fully/partially built. The getXMLStreamReaderWithoutCaching() method is very useful when the events need to be handled in a pull based manner without any intermediate models. This makes such operations faster and efficient.
For consistency reasons once the cache is switched off it cannot be switched on again. |
Although the serializer acts correctly in every situation, the code that it produces may not be efficient all the time. Take the following case where a similar code listing to 1.6 is used but with two elements having the same namespace. Note that the newly added items are in bold.
OMFactory factory = OMAbstractFactory.getOMFactory(); OMNamespace ns1 = factory.createOMNamespace("bar","x"); OMElement root = factory.createOMElement("root",ns1); OMNamespace ns2 = root.declareNamespace("bar1","y"); OMElement elt1 = factory.createOMElement("foo",ns1); OMElement elt2 = factory.createOMElement("yuck",ns2); OMText txt1 = factory.createText(elt2,"blah"); elt2.addChild(txt1); elt1.addChild(elt2); root.addChild(elt1);
Serialization of the root element provides the following XML
<x:root xmlns:x="bar" xmlns:y="bar1"> <x:foo> <y:yuck>blahblah</y:yuck> <y:yuck>blah</y:yuck> </x:foo> </x:root>
However if the serialization is carried on the foo element then the following XML is produced
<x:foo xmlns:x="bar" > <y:yuck " xmlns:y="bar1">blahblah</y:yuck> <y:yuck " xmlns:y="bar1">blah</y:yuck> </x:foo>
Note that the same Namespace is serialized twice. This XML is semantically correct but the same semantics could have been achieved by placing the y namespace declaration on the parent element. This behavior is due to the nature of the serialization where it tries to be accurate but not optimal. It is deliberately kept unchanged since such optimizations slow down the common case.
This is meant to be a small yet comprehensive introduction to AXIOM. AXIOM however is a lot more than what is described in this tutorial. Readers are welcome to explore AXIOM, specially it's capabilities to handle binary content.
import org.apache.axis2.om.SOAPEnvelope; import org.apache.axis2.om.OMFactory; import org.apache.axis2.om.OMXMLParserWrapper; import org.apache.axis2.impl.llom.factory.OMXMLBuilderFactory; import javax.xml.stream.*; import java.io.FileReader; import java.io.FileNotFoundException; public class TestOMBuilder { /** * Pass the file name as an argument * @param args */ public static void main(String[] args) { try { //create the parser XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(new FileReader(args[0])); //create the builder OMXMLParserWrapper builder = OMXMLBuilderFactory.createStAXSOAPModelBuilder(OMAbstractFactory.getOMFactory(), parser); //get the root element (in this case the envelope) SOAPEnvelope envelope = (SOAPEnvelope) builder.getDocumentElement(); //get the writer XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(System.out); //dump the out put to console with caching envelope.serialize(writer); writer.flush(); } catch (XMLStreamException e) { e.printStackTrace(); } catch (FileNotFoundException e) { e.printStackTrace(); } } }
All rights reserved by Apache Software Foundation |