Apache UIMA-DUCC (Unstructured Information Management Architecture - Distributed UIMA Cluster Computing ) v.2.2.1 Release Notes
Contents
1. What is UIMA-DUCC?
2. Major Changes in this Release
3. Migration from a Prior Release
4. Limitations
DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system providing tooling,
management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework.
Core UIMA provides a generalized framework for applications that process unstructured information such as human
language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA
pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources.
DUCC defines a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC
provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.
Apache UIMA DUCC 2.2.1 is a maintenance release containing new features and bug fixes. What's new:
- The userid of a privileged DUCC installation does not have to be "ducc"
- ducc-mon login can be used on systems where users do not have password login
- The DUCC head-node daemons may be moved to another host without breaking working applications
- The deployment descriptor for a UIMA-AS service can be loaded from the classpath
- Interactive applications run correctly with viaducc (fixed lost inputs)
- Files created by DUCC jobs inherit the permissions of the launching shell's umask.
- DUCC performance breakdown for scaled synchronous pipelines is now correct
For a complete list of issues fixed and up-to-date information on UIMA-DUCC issues, see our issue tracker:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20UIMA%20AND%20fixVersion%20%3D%20%222.2.1-Ducc%22%20
This version of DUCC includes UIMA-SDK v.2.9.0, UIMA-AS v.2.9.0, and ActiveMQ v.5.14.0.
When upgrading from an existing installation the ducc_update script may be used to replace the system files while leaving the site-specific configuration files in place. For more information see ducc_update in the Administrative Commands section of the DuccBook.
On some systems cgroups swap accounting is not enabled and duccmon will show N/A for swap. To
confirm, please check memory.stat file in /ducc/ folder. If swap accounting is
enabled there should be "swap" property defined. If it's missing, you need to add a kernel parameter
swapaccount=1. Details of how to do this can be found here.
Due to a bug in uima sdk, the uima AnalysisEngineProcessException cannot be serialized as a Java object. If your
analysis engine throws an exception in process(), the ducc framework will stringify it and wrapt it in
java RuntimeException. If you have a custom error handler plugged in into a job driver you will not be
able to test for AnalysisEngineProcessException in a stack trace with a code like this:
if ( error instanceof AnalysisEngineProcessException ) ...