Apache UIMA (Unstructured Information Management Architecture) v2.8.0 Release Notes

What is UIMA?
Major Changes in this Release
How to Get Involved
How to Report Issues
List of JIRA Issues Fixed in this Release

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS, a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

Major Changes in this Release

Iterating over sorted indexes performance improvement

Sorted index usage by iterators is monitored, and when it is determined that a particular index is no longer being updated and is experiencing poor performance due to merging of subindexes for the subtypes being iterated over, a one-time conversion of this index to a "flattened" version is done; the flattened form incurs the cost of merging subindexes once, and for subsequent uses, iteration speed is improved. Any update that affects this flattened index will cause it to be discarded and not used for subsequent iterations. Performance improvement should be seen for the perhaps common use case where one annotator creates and indexes instances of a type and its subtypes, and subsequent annotators read and perhaps update these, but don't add or remove entries of this type/subtypes from the index. Jira issue UIMA-4357

Other major changes

The Document Analyzer's Java-based Cas Viewer was extended to support being able to select which annotations are highlighted, using a selection approach where you can specify a type plus a feature of that type plus one or more values of that feature. Jira issue UIMA-3374
Performance for iterators over UIMA Set Indexes was improved. Jira issue: UIMA-4345.

API changes

Note that we have moved to a more formal semantic versioning, following the ideas in http://semver.org/.

There were no API changes that affect backward compatibility. Some new APIs were added or extended.

Add back the API alreadyCopied that was accidentally removed in the CasCopier class in version 2.7.0. Jira issue: UIMA-4428.
add new variants of getAllIndexedFS, getIndex, and getAnnotationIndex which take an additional argument that specifies the JCas class, and permits these to return the correct generically typed indexes. Jira issue: UIMA-4299.

Apache UIMA (Unstructured Information Management Architecture) v2.8.0 Release Notes

Contents

1. What is UIMA?

Major Changes in this Release

Iterating over sorted indexes performance improvement

Other major changes

API changes

Full list of JIRA Issues Fixed in this Release

How to Get Involved

How to Report Issues