XMI and EMF Interoperability

In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph. There are established standards in this area – specifically, UML® is an OMG™ standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML representation of object graphs.

Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based application development, and it is based on UML and XMI. In EMF, you define class models using a metamodel called Ecore, which is similar to UML. EMF provides tools for converting a UML model to Ecore. EMF can then generate Java classes from your model, and supports persistence of those classes in the XMI format.

The UIMA SDK now provides tools for interoperability with XMI and EMF. These tools allow conversions of UIMA Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format. This provides a number of advantages, including:

You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically convert it to a UIMA Type System.

You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it produces to XMI. This data is now in a form where it can easily be ingested by an EMF-based application.

More generally, we are adopting the well-documented, open standard XMI as the standard way to represent UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard enables other applications to more easily produce or consume these UIMA analysis results.

For more information on XMI, see Grose et al. Mastering XMI. Java Programming with XMI, XML, and UML. John Wiley & Sons, Inc. 2002.

For more information on EMF, see Budinsky et al. Eclipse Modeling Framework 2.0. Addison-Wesley. 2006.

For details of how the UIMA CAS is represented in XMI format, see the XMI CAS Serialization Reference .

The UIMA SDK provides the following two classes:

Ecore2UimaTypeSystem: converts from an .ecore model developed using EMF to a UIMA-compliant TypeSystem descriptor. This is a Java class that can be run as a standalone program or invoked from another Java application. To run as a standalone program, execute:

java com.ibm.uima.ecore.Ecore2UimaTypeSystem <ecore file> <output file>

The input .ecore file will be converted to a UIMA TypeSystem descriptor and written to the specified output file. You can then use the resulting TypeSystem descriptor in your UIMA application.

UimaTypeSystem2Ecore: converts from a UIMA TypeSystem descriptor to an .ecore model. This is a Java class that can be run as a standalone program or invoked from another Java application. To run as a standalone program, execute:

java com.ibm.uima.ecore.UimaTypeSystem2Ecore
<TypeSystem descriptor> <output file>

The input UIMA TypeSystem descriptor will be converted to an Ecore model file and written to the specified output file. You can then use the resulting Ecore model in EMF applications. The converted type system will include any <import...>ed TypeSystems; the fact that they were imported is currently not preserved.

To run either of these converters, your classpath will need to include the UIMA jar files as well as the following jar files from the EMF distribution: common.jar, ecore.jar, and ecore.xmi.jar.

Also, note that the uima_core.jar file contains the Ecore model file uima.ecore, which defines the built-in UIMA types. You may need to use this file from your EMF applications.

The UIMA SDK provides XMI support through the following two classes:

XmiCasSerializer: can be run from within a UIMA application to write out a CAS to the standard XMI format. The XMI that is generated will be compliant with the Ecore model generated by UimaTypeSystem2Ecore. An EMF application could use this Ecore model to ingest and process the XMI produced by the XmiCasSerializer.

XmiCasDeserializer: can be run from within a UIMA application to read in an XMI document and populate a CAS. The XMI must conform to the Ecore model generated by UimaTypeSystem2Ecore.

Also, the uima_examples Eclipse project contains some example code that shows how to use the serializer and deserializer:

com.ibm.uima.examples.xmi.XmiWriterCasConsumer: This is a CAS Consumer that writes each CAS to an output file in XMI format. It is analogous to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it uses the XMI serialization format.

com.ibm.uima.examples.xmi.XmiCollectionReader: This is a Collection Reader that reads a directory of XMI files and deserializes each of them into a CAS. For example, this would allow you to build a Collection Processing Engine that reads XMI files, which could contain some previous analysis results, and then do further analysis.

Finally, in under the folder uima_examples/ecore_src is the class com.ibm.uima.examples.xmi.XmiEcoreCasConsumer, which writes each CAS to XMI format and also saves the Type System as an Ecore file. Since this uses the UimaTypeSystem2Ecore converter, to compile it you must add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar – see ecore_src/readme.txt for instructions.