Apache UIMA (Unstructured Information Management Architecture) v3.0.0-alpha Release Notes

Alpha release status notice
What is UIMA?
Major Changes in this Release
How to Get Involved
How to Report Issues
List of JIRA Issues Fixed in this Release

0. Alpha release status

This is an alpha release, done in order to expand the community of people doing initial testing on this new release. As such, it is possible that bugs or deficiencies may be found which might require changes to APIs.

Following a period of alpha testing, depending on the results, we will proceed to more official releases.

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS, a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

Major Changes in this Release

Version 3 is a major reimplementation of the internals of UIMA, with many significant changes. A one-time migration step is required if JCas cover classes are being used. Please read the Version 3 users guide located in the docs directory for information on these changes and the migration tool.

A very brief summary of Version 3:

Feature Structures become Java Objects, and may be Garbage Collected.
Many performance improvements
Iterating over indexes no longer throws ConcurrentModificationException
Support for arbitrary Java Objects in the CAS
Using above, new built-ins for ArrayList style lists of Feature Structures and ints, and a FSHashSet
New "select" framework for flexible access to Feature Structures
Integration with Java 8 facilities such as Streams

Please read the overview section of the Version 3 users guide for more.

API changes

Many APIs were changed, typically by adding new capability. However, we attempt to preserve existing APIs, for backward compatibility.

Apache UIMA (Unstructured Information Management Architecture) v3.0.0-alpha Release Notes

Contents

0. Alpha release status

1. What is UIMA?

Major Changes in this Release

API changes

Full list of JIRA Issues affecting this Release

Alpha Release

How to Get Involved

How to Report Issues