Despite the flexibility, interoperability and global acceptance of XML,
there are times when serializing data into XML does not make sense. Web
services users may need to transmit binary attachments of various sorts like
images, drawings, xml docs, etc together with SOAP message. Such data are
often in a particular binary format.
Traditionally, two techniques have been used in dealing with opaque data
in XML;
"By value"
Sending binary data by value is achieved by embedding opaque data (of
course after some form of encoding) as element or attribute content of
the XML component of data. The main advantage of this technique is that
it gives applications the ability to process and describe data based and
looking only on XML component of the data.
XML supports opaque data as content through the use of either base64
or hexadecimal text encoding. Both these techniques bloat the size of the
data. For UTF-8 underlying text encoding, base64 encoding increases the
size of the binary data by a factor of 1.33x of the original size, while
hexadecimal encoding expands data by a factor of 2x. Above factors will
be doubled if UTF-16 text encoding is used. Also of concern is the
overhead in processing costs (both real and perceived) for these formats,
especially when decoding back into raw binary.
"By reference"
Sending binary data by reference is achieved by attaching pure
binary data as external unparsed general entities outside of the XML
document and then embedding reference URI's to those entities as
elements or attribute values. This prevents the unnecessary bloating of
data and wasting of processing power. The primary obstacle for using
these unparsed entities is their heavy reliance on DTDs, which impedes
modularity as well as use of XML namespaces.
There were several specifications introduced in the Web services
world to deal with this binary attachment problem using the "by
reference" technique. SOAP with Attachments
is one such example. Since SOAP prohibits document type declarations
(DTD) in messages, this leads to the problem of not representing data
as part of the message infoset, creating two data models. This scenario
is like sending attachments with an e-mail message. Even though those
attachments are related to the message content they are not inside the
message. This causes the technologies for processing and description
of data based on XML component of the data, to malfunction. One example
is WS-Security.
Where Does MTOM Come In?
MTOM (SOAP
Message Transmission Optimization Mechanism) is another specification
which focuses on solving the "Attachments" problem. MTOM tries to leverage
the advantages of above two techniques by trying to merge the two techniques.
MTOM is actually a "by reference" method. Wire format of a MTOM optimized
message is same as the Soap with Attachments message, which also makes it
backward compatible with SwA endpoints. The most notable feature of MTOM is
the use of XOP:Include element, which is defined in XML Binary Optimized
Packaging (XOP) specification to reference the binary attachments
(external unparsed general entities) of the message. With the use of this
exclusive element the attached binary content logically become inline (by
value) with the SOAP document even though actually it is attached separately.
This merges the two realms by making it possible to work only with one data
model. This allows the applications to process and describe by only looking
at XML part making reliance on DTDs obsolete. On a lighter note MTOM has
standardized the referencing mechanism of SwA. Following is an extract from
the XOP
specification.
At the conceptual level, this binary data can be thought of as being
base64-encoded in the XML Document. As this conceptual form might be needed
during some processing of the XML Document (e.g., for signing the XML
document), it is necessary to have a one to one correspondence between XML
Infosets and XOP Packages. Therefore, the conceptual representation of such
binary data is as if it were base64-encoded, using the canonical lexical form
of XML Schema base64Binary datatype (see [XML Schema
Part 2: Datatypes Second Edition] 3.2.16
base64Binary). In the reverse direction, XOP is capable of optimizing
only base64-encoded Infoset data that is in the canonical lexical
form.
AXIOM is (and may be the first) Object Model which has the ability to hold
binary data. It has been given this ability by allowing OMText to hold raw
binary content in the form of javax.activation.DataHandler. OMText has been
chosen for this purpose with two reasons. One is that XOP (MTOM) is capable
of optimizing only base64-encoded Infoset data that is in the canonical
lexical form of XML Schema base64Binary datatype. Other one is to preserve
the infoset in both sender and receiver (To store the binary content in the
same kind of object regardless of whether it is optimized or not).
MTOM allows to selectively encode portions of the message, which allows us
to send base64encoded data as well as externally attached raw binary data
referenced by "XOP" element (optimized content) to be send in a SOAP message.
User can specify whether an OMText node which contains raw binary data or
base64encoded binary data is qualified to be optimized or not at the
construction time of that node or later. To take the optimum efficiency of
MTOM a user is advised to send smaller binary attachments using
base64encoding (None optimized) and larger attachments as optimized
content.
Also a user can create an optimizable binary content node using a base64
encoded string, which contains encoded binary content, given with the mime
type of the actual binary representation.
Axis2 uses javax.activation.DataHandler to handle the binary data. All
optimized binary content nodes will be serialized as Base64 Strings if "MTOM
is not enabled". One can also create binary content nodes which will not be
optimized at any case. They will be serialized and send as Base64 Strings.
Enabling MTOM Optimization at Client Side
Set the "enableMTOM" property in the Options to true, when sending
messages.
When this property is set to true any SOAP envelope, regardless whether it contains optimisable content or not,
will be serialized as a MTOM optimized MIME message.
Axis2 serializes all binary content nodes as Base64 encoded strings
regardless of they are qualified to be optimize or not, if,
User do not have to specifiy anything inoder for Axis2 to receive MTOM optimised messages.
Axis2 will automatically identify and de-serialize accordingly as and when a MTOM message arrives.
Enabling MTOM Optimization at Server Side
Axis 2 server automatically identifies incoming MTOM optimized messages
based on the content-type and de-serializes accordingly. User can enableMTOM
in the server side for outgoing messages,
To enableMTOM globally for all services users can set the "enableMTOM" parameter to true in the Axis2.xml.
When it is set, all outgoing messages will be serialized and send as MTOM optimized MIME messages.
If it is not set all the binary data in binary content nodes will be
serialized as Base64 encoded strings. This configuration can be overriden in services.xml for per service and per operation
basis.
When using MTOM, you simply define the binary file as part of your SOAP message as
type="xsd:base64Binary" or type="xsd:hexBinary. You indicate the
type of content in the element at runtime using an MTOM attribute extension,
xmime:contentType. Furthermore, you can identify what type of data
might be expected in the element using the xmime:expectedContentType. Putting it all
together, our example element becomes:
Lets define a full, validated doc / lit style WSDL that imports the xmime schema, has a service that
receives a jpeg and returns a pass / fail status to the client:
The important point here is we import http://www.w3.org/2005/05/xmlmime and define an element, 'MyBinaryData' , that utilizes MTOM.
The next step is using the Axis2 tool 'WSDL2Java' to generate java source files from this WSDL. See the 'Code Generator Tool' guide for more
info. Here, we define an ant task that chooses XMLBeans as the databinding implementation. We also choose to generate an interface which our
Skeleton will implement. The name we list for the WSDL above is mtomExample.wsdl, and we define our package name for our generated source files to be
'org.apache.axis2.samples.mtomDatabinding.endpoint' . Our ant task for this example is:
Now we are ready to code. Lets edit output/src/org/apache/axis2/samples/mtomDatabinding/endpoint/MyMTOMServiceSkeleton.java
and fill in the business logic. The end result becomes:
The code above receives a jpeg file and writes it to disk.
It returns zero on success and in the case of an error, returns -1 along with a stacktrace. Now lets define the client:
The last step is to create an AAR with our Skeleton and the generated interface and services.xml, and then deploy the service. See the user guide for more info.
SOAP with Attachments (SwA) with Axis2
Receiving SwA type attachments
Axis2 automatically identifies SwA messages based on the content type. Axis2 stores the references
to the received attachment parts (MIME parts) in the Message Context. Axis2 preserves the order of the received attachments
when storing them in the MessageContext. Users can access binary
attachments using the attachement API given in the Message Context using content-id of the mime part as the key.
Care needs be taken to rip off the "cid" prefix when content-id
is taken from the "Href" attributes. Users can access the the message context from whithin a service
implementation class using the "setOperationContext()" method as shown in the following example.
Note: Axis2 supports content-id based referencing only. Axis2 does not support
Content Location based referencing of MIME parts.
Sample service which accesses a received SwA type attachment
Sending SwA type attachments
User need to set the "enableSwA" property to true in order to be able to send SwA
messages. Axis2 user is not expected to enable MTOM & SwA together.
In such a situation MTOM will get priority over SwA.
This can be set using the axis2.xml as follows.
"enableSwA" can also be set using the client side Options as follows
Users are expected to use the attachment API provided in the MessageContext to specify the
binary attachments needed to be attached to the outgoing message as SwA type attachments. Client side SwA capability
can be used only with the OperationClient api, since the user needs the ability to access the MessageContext.
Sample client which sends a message with SwA type attachments
MTOM Backward Compatibility with SwA
MTOM specification is designed to be backward compatible with the SOAP
with Attachments specification. Even though the representation is different,
both technologies have the same wire format. We can safely assume that any
SOAP with Attachments endpoint can accept a MTOM optimized messages and treat
them as SOAP with Attachment messages - Any MTOM optimized message is a valid
SwA message.
Note : Above backword compatibility was succesfully tested against Axis 1.x
A sample SwA message from Axis 1.x
Corresponding MTOM message from Axis2
Advanced Topics
File Caching for Attachments
Axis2 comes handy with a file caching mechanism for incoming attachments,
which gives Axis2 the ability to handle very large attachments without
buffering them in memory at any time. Axis2 file caching streams the incoming
MIME parts directly in to files, after reading the MIME part headers.
Also a user can specify a size threshold for the File caching (in bytes). When this
threshold value is specified, only the attachments whose size is bigger than
the threshold value will get cached in files. Smaller attachments will remain
in Memory.
NOTE : It is a must to specify a directory to temporary store the
attachments. Also care should be taken to clean that directory from time to
time.
The following parameters need to be set in Axis2.xml in order to enable
file caching.
Enabling file caching for client side receiving can be done for the by setting the Options as follows.