Apache UIMA (Unstructured Information Management Architecture) v3.2.0 Release Notes

What is UIMA?
Major Changes in this Release
How to Get Involved
How to Report Issues
List of JIRA Issues Fixed in this Release

What is UIMA?

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. UIMA is a framework and SDK for developing such applications. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at. UIMA enables such an application to be decomposed into components, for example "language identification" -> "language specific segmentation" -> "sentence boundary detection" -> "entity detection (person/place names etc.)". Each component must implement interfaces defined by the framework and must provide self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Apache UIMA is an Apache-licensed open source implementation of the UIMA specification (that specification is, in turn, being developed concurrently by a technical committee within OASIS, a standards organization). We invite and encourage you to participate in both the implementation and specification efforts.

UIMA is a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++, with some support for Perl, Python and TCL.

Notable changes in this release

Added AnnotationPredicates utility class providing various predicates testing how annotations relate to each other (e.g. covering, being covered by, following, preceding, etc.)
Added single-int arg version of select.startAt()
Added trim method to AnnotationFS
Added ability to serialize as XMIs as XML 1.1
Added ability to serialize as XMIs pretty-printed using CasIOUtils
Added typed parameter support to PEARs
Improve performance of setting up JCas classes by reducing sync lock contention
Improved speed of constructing aggregate engines
Fixed de-serialization of array subtypes in form 6 binary CASes
Fixed parameter-fetching methods PearSpecifier_impl not returning null because they promise not to
Fixed logs being spammed with "Import by location/name..." messages
Fixed numerous bugs and inconsistencies in the SelectFS implementation
Fixed CAS-transportable Java objects not being properly deserialized
Fixed FSArray.spliterator() to work in PEAR scenarios
Fixed memory leak in FSClassRegistry in scenarios with large numbers of dynamically created classloaders
Fixed oddball "race" condition when initializing JCas classes
Fixed problem when reading mixed sets of binary CASes from UIMAv2 and UIMAv3
Fixed IndexOutOfBoundsException in CVD
Fixed bug causing Annotation to be returned when asking JCas for a specific type
Fixed ability to install PEARs into directories containing XML special characters in the name
Fixed not-indexed document annotation being wrongly added back to the index during de-serialization
Fixed index protection for cases that no FSes were indexed
Fixed concurrent binary serialization producing corrupt output
Fixed deep cloning of AnalysisEngineDescription
Fixed race condition in type system consolidation
Fixed re-initialization of multi-view CAS with a different type system
Fixed logger silently discarding a parameter in some cases (placeholder filler or throwable)
No longer ship Pack200-compressed versions of the Eclipse plugins
Converted UIMAv3 User's Guide from DocBook to Asciidoc

API changes

SelectFS API with zero-width annotations

The behavior of the selectFS API changes in this release, in particular with respect to the handling of zero-width annotations (those that have the same start and end position). The behavior has been made to align with the new annotation predicates, the details of which are described in the UIMAv3 User's Guide.

SelectFS API with negative shift on bounded selections

The shifted operation can no longer be used to expand a selection beyond its selection boundaries. Consider the following example:


t1 = new Token(0,1)
t2 = new Token(2,3)
t3 = new Token(4,5)
t4 = new Token(6,7)
t5 = new Token(8,9)

In previous versions, was also possible to use a negative shift with a bounding operator such as following, coveredBy, etc. and it would call moveToPrevious on the internal iterator of the selection operation, causing it to return annotations occurring before the bounds e.g.:


select().shifted(-1).following(t3) => {t3, t4, t5}

This was found to be inconsistent behavior. The iterator used for the selection (which can also be obtained by calling fsIterator()) should respect the bounds.

As of this UIMA version, using shifted with a negative argument in conjunction with a bounding operator will trigger a warning in the logs and return an empty result.


select().shifted(-1).following(t3) => {}
select().following(t3) => {t4, t5}

SelectFS API with Backwards selection with startAt

In previous versions, the using the moveTo operation backwards iterators obtained through SelectFSs did never ignore type priorities - even though SelectFSs by default should ignore them.

Apache UIMA (Unstructured Information Management Architecture) v3.2.0 Release Notes

Contents

What is UIMA?

Notable changes in this release

API changes

SelectFS API with zero-width annotations

SelectFS API with negative shift on bounded selections

SelectFS API with Backwards selection with startAt

Full list of JIRA Issues affecting this Release

How to Get Involved

How to Report Issues