Apache UIMA-DUCC (Unstructured Information Management Architecture - Distributed UIMA Cluster Computing ) v.2.1.0 Release Notes
Contents
1. What is UIMA-DUCC?
2. Major Changes in this Release
3. Migration from a Prior Release
DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system providing tooling,
management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework.
Core UIMA provides a generalized framework for applications that process unstructured information such as human
language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA
pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources.
DUCC defines a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC
provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.
Apache UIMA DUCC 2.1.0 is a major release containing new features and bug fixes. What's new:
- DUCC framework now runs on Java 7 or Java 8 and supports applications using either JRE.
Note that some job and other history data are saved as Java serialized objects and not
portable between Java 7 and Java 8.
- Ubuntu and RHEL 7 support
- cgroup enhancements
- uses standard cgroups organization
- supports cgroup swappiness setting, restricting any swapping if desired
- DUCC state and history storage moved from flat files to Cassandra DB, reducing storage size 5x
- Ships with the latest UIMA-AS v2.8.1
- Ships with recent ActiveMQ v5.13.2
- DUCC's UIMA-AS services support failover and ssl connectors
- Many DUCC webpage improvements
- Clear user display of DUCC classes and relation to machines
- Robust handling of dynamic changes to DUCC class and nodepool definitions
- Full support of nodepools with different quantum
- DUCC broker access restricted to user ducc
- Eliminate need for user home directories located on a shared filesystem
- Built-in Job error handler programmable per job
- Migration utility for DUCC updates
- Change to vary-off behavior to facilitate cluster management
- Horizontal stacking of services instance allocations
- java-viaducc improvements including separation of stdout from stderr respoonses
- An alert banner is displayed on ducc-mon pages if daemons are down
- Promoted DUCC from sandbox to the regular Apache project in the SVN
For a complete list of issues fixed and up-to-date information on UIMA-DUCC issues, see our issue tracker:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20UIMA%20AND%20fixVersion%20%3D%20%222.1.0-Ducc%22%20
An existing DUCC installation can be updated in place by using the ducc_update
script which can be copied from the
UIMA Downloads page
or extracted from the binary distribution.
Additional steps are required to convert existing history and state files for database access.
Details are in the INSTALL document and the DuccBook.