Apache UIMA (Unstructured Information Management Architecture) for C++ v2.4.0 Release Notes

Contents

1. What is UIMA?. 1

2. Major Changes in this Release. 2

3. List of Issues Fixed in this Release. 2

4. Release Compatibility. 3

5. Known Issues. 3

6. How to Get Involved. 4

7. How to Report Issues. 4

8. More Documentation on Apache UIMA C++. 4

 

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS , a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

2. Major Changes in this Release

This section describes what has changed between version 2.3.0 and version 2.4.0 of UIMA C++.

2.1. Enhancements to C++ service wrapper for UIMA AS

  • Enabled support for ActiveMQ CPP failover protocol
  • Migrated to ActiveMQ CPP version 3.4.1.
  • Fixed bug with termination when C++ service is terminated by the UIMA AS Java controller.

2.2. Enchancements to Linux build

2.3. Updates to base UIMA C++ SDK

  • Migrated to APR version 1.3.8 and added APR-Util to binary distribution required by ActiveMQ CPP version 3.4.1.
  • SDK is built with ICU version 3.6.  This requires rebuild of the UIMA C++ components as all UIMA C++ components must be built with the same version of ICU.
  • Implemented copyToArray and copyFromArray methods for Array type feature structures.
  • Fixed XMI serialization of empty string value.
  • Fixed missing TypeSystemDescription element when serializing AE metadata.
  • Fixed incorrect handling of sofa mapping in a UIMC C++ aggregate AE.

3. List of Issues Fixed in this Release

Issue Key

Summary

UIMA-2333

Build one source distribution which includes Windows and Linux files

UIMA-2328

Cleanup the Linux source distribution

UIMA-2312

Migrate UIMA C++ service wrapper to ACTIVEMQ CPP 3.4.1

UIMA-2307

BasicArrayFS has two unimplemented Functions: copyToArray, copyFromArray.

UIMA-2053

Changes to standardize UIMA C++ build and packaging on Linux

UIMA-1964

UIMA C++ service wrapper is not correctly shutting down when Java controller terminates

UIMA-1941

UIMA CPP aggregate AE incorrect handling of sofa mapping

UIMA-1940

TypeSystemDescription element missing when Aggregate AE metadata is serialized to XML

UIMA-1925

Enable failover protocol support in UIMA C++ service wrapper

UIMA-1913

Replace usage of ActiveMQ CPP utitlity APIs with APR functions

UIMA-1912

Changes for GCC 4.3+ compatibilty and header file conventions

UIMA-1886

XMI serialization incorrectly handling string feature set to an empty string

UIMA-2328

Cleanup Linux source distribution

UIMA-2333

Build one source distribution that includes Linux and Windows files.

UIMA-2348

Augment UIMACpp binary lic/notice with appropriate items from other embedded binaries.

UIMA-2352

Build script for uimacpp sdk on Windows does not correctly copy scriptators docs and xerces libs.

UIMA-2356

Fix warnings generated when creating the UIMA C++ doxygen docs.

UIMA-2361

UIMA C++ fails to build with gcc 4.5.2.

UIMA-2362

Add APR 1.4.x to list of acceptable version to build UIMA C++

UIMA-2363

Add APR-Util libraries to the UIMA C++ binary package

UIMA-2365

UIMACPP build fails on Mac OS X stricmp unavailable replace with strcasecmp in deployCppService.hpp

UIMA-2425

Xerces exception messages not being properly converted to native code-page.

UIMA-2433

Scriptator makefiles modified to work with different newer versons of Python and SWIG.

UIMA-2466

APR-iconv missing from UIMA C++ Windows binary build

 

4.   Release Compatibility

There are two distinct features of UIMA C++ to consider when dealing with release compatibility:

 

  • The framework dynamically loads annotators which are user code. The annotators make calls to UIMA C++ APIs

            and are built with some version of the SDK. A possible scenario is for an application to run annotators that

            were built with different  releases of UIMA C++ SDK.

  • The SDK depends on ICU, XERCES, APR and ACTIVEMQ-CPP and a UIMA C++ SDK release is built with a particular version of these. 

            Binary compatibility therefore depends on the compatibility of these underlying libraries.  In particular, 

            ICU and XERCES encode the major and minor release numbers in the APIs which restricts binary compatibility across

            releases of these libraries.  

  • An application running UIMA C++ is restricted to running one version of the ICU library

            in a process and all annotators and underlying libraries must use the same ICU version.

  

We do not enforce binary compatibility when doing a release.  Migrating to a new version of UIMA C++ may require rebuild of the annotators.

 

Installing UIMACPP SDK as a system-wide shared library is discouraged since we do not

have support for parallel versions.  The include directory does not have version number and

there cannot be multiple versions of executables runAECpp and deployCppService.

4.   Known Issues

 The following are known open issues:

  • Parameter overrides are not supported for C++ aggregates.
  • Sofa Mapping is not supported for C++ Aggregate AE called from Java.

6. How to Get Involved

The Apache UIMA project really needs and appreciates any contributions, including documentation help, source code and feedback. If you are interested in contributing, please visit http://uima.apache.org/get-involved.html.

7. How to Report Issues

The Apache UIMA project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/uima

8. More Documentation on Apache UIMA C++

Please see Overview and Setup for a high level overview of UIMA C++, and Doxygen docs for details on the UIMA C++ APIs.