Apache UIMA (Unstructured Information Management Architecture) v2.8.0 Release Notes

Contents

What is UIMA?
Major Changes in this Release
How to Get Involved
How to Report Issues
List of JIRA Issues Fixed in this Release

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS, a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

Major Changes in this Release

Iterating over sorted indexes performance improvement

Sorted index usage by iterators is monitored, and when it is determined that a particular index is no longer being updated and is experiencing poor performance due to merging of subindexes for the subtypes being iterated over, a one-time conversion of this index to a "flattened" version is done; the flattened form incurs the cost of merging subindexes once, and for subsequent uses, iteration speed is improved. Any update that affects this flattened index will cause it to be discarded and not used for subsequent iterations. Performance improvement should be seen for the perhaps common use case where one annotator creates and indexes instances of a type and its subtypes, and subsequent annotators read and perhaps update these, but don't add or remove entries of this type/subtypes from the index. Jira issue UIMA-4357

Other major changes

API changes

Note that we have moved to a more formal semantic versioning, following the ideas in http://semver.org/.

There were no API changes that affect backward compatibility. Some new APIs were added or extended.

Full list of JIRA Issues Fixed in this Release

Click issuesFixed/jira-report.hmtl for the list of issues fixed in this release.

How to Get Involved

The Apache UIMA project really needs and appreciates any contributions, including documentation help, source code and feedback. If you are interested in contributing, please visit http://uima.apache.org/get-involved.html.

How to Report Issues

The Apache UIMA project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/uima