1. What is UIMA-DUCC?
2. Major Changes in this Release
3. Limitations in this Release
DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system providing tooling, management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework. Core UIMA provides a generalized framework for applications that process unstructured information such as human language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources. DUCC defines a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.
UIMA DUCC 1.1.0 Apache is a maintenance release containing bug fixes and a few
new features. What's new:
All registered groups are set for processes. User may set DUCC_UMASK to establish the umask for a processes.
Administrative CLI interface Vary-off a node to temporarily exclude it from scheduling Vary-on a node to return it to the scheduling pool Query occupancy - for each node, shows what is scheduled there Query load - summary of scheduling tables to allow external entities such as LSF to collaborate with DUCC scheduler Misc enhancements Better handling of failed nodes, purges all work other than reservations Improved de-fragmentation logic Improved handling of small clusters Improved eviction, takes into account the amount of work that would be lost before scheduling a process for eviction.
Added Node visualization
For a complete list of issues fixed and up-to-date information on UIMA-DUCC issues, see our issue tracker: https://issues.apache.org/jira/issues/?jql=project%20%3D%20UIMA%20AND%20fixVersion%20%3D%20%221.1.0-Ducc%22%20ORDER%20BY%20key%20ASCDUCC's Web Server comprises a javascript that provides the ability to monitor various aspects of the DUCC system via a browser. It has been occasionally observed for a browser that if several tabs are simultaneously activated, each containing an "Automatic" monitor of one aspect of the DUCC system, then over a relatively long period of time (on the order of days) the browser process may consume a large amount of memory (on the order of several GB). At the time of this writing, this problem is not reliably reproduced. This limitation has not been observed when in "Manual" monitoring mode. The memory bloat has only been observed on Firefox browser.