Apache UIMA (Unstructured Information Management Architecture) v3.0.0-beta Release Notes

Beta release status notice
What is UIMA?
Major Changes in this Release
How to Get Involved
How to Report Issues
List of JIRA Issues Fixed in this Release

0. Beta release status

This is the Beta release of 3.0.0; the next release is planned to be the official 3.0.0 release. The beta release is being done in order to expand the community of people doing testing on this new release. The APIs at this point are expected to be stable.

1. What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS, a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

Major Changes in this Release

Version 3 is a major reimplementation of the internals of the Java version of UIMA to support better integration with Java 8, alignment with modern memory hierarchies for better performance, and multiple functional enhancements and improvements.

If non-built-in JCas cover classes are being used, a one-time regeneration or migration step is required for these. Please read the Version 3 users guide located in the docs directory for information on these changes and the migration tool which can aid in migrating existing JCas class definitions.

A very brief summary of Version 3:

Feature Structures become Java Objects, subject to Garbage Collection, just like any other objects, when no longer reachable.
Iterating over indexes no longer throws ConcurrentModificationException.
Support for arbitrary Java Objects in the CAS, using special custom implementations of JCas cover classes for those.
Using the above new capability, new semi-built-ins for three new types: ArrayList style lists of Feature Structures and ints, and an FSHashSet
New "select" framework for flexible access to Feature Structures, that can ignore typeOrdering keys in indexes.
Integration with Java 8 facilities such as Streams
Logging framework upgraded, supporting embedding UIMA in other frameworks using popular logging APIs
There are many performance improvements

Please read the Version 3 users guide for more details.

Specific changes in this release versus the previous alpha02 release include

The use of generic typing was improved.
Fixed the Eclipse launcher in the examples project that's part of the binary distribution to properly launch the v3-migration tool.
The UIMA built-in arrays and lists implement iterable and stream, with appropriate capabilities for generic type specification.
Eclipse plugins are now Jar-signed.
Renamed some methods to create UIMA arrays and lists from createFromArray to create
Added missing methods to logger interface to support Supplier argument.
The launchers for the normal UIMA Tooling default to specifying the built-in-to-Java logging framework.
The launchers were modified to support Java-style wild-card notation in the classpath.
Fix for permissions on Linux for API Change Report.
Restore a v2 method needed for backwards compatibility
Improve the migration tool's use of classpath and error reporting
Changes in the support for External Override Settings location were incorporated
Fixes incorporated into UIMA 2.10.1 are included.
Made JCas loading more permissive - no longer need to have the type system committed before loading
JCasGen - add missing imports
renamed the methods for getting empty UIMA arrays and lists to emptyXxxx form
add methods supporting extended-for over all FSs in views
several bugs fixed - see issues fixed, for all issues after 18 Sept 2017

API changes

There is an API compatibility report covering the non-internal APIs; see change report for the list of API changes since version 2.10.1.

Apache UIMA (Unstructured Information Management Architecture) v3.0.0-beta Release Notes

Contents

0. Beta release status

1. What is UIMA?

Major Changes in this Release

API changes

Full list of JIRA Issues affecting this Release

Beta Release

How to Get Involved

How to Report Issues