1. What is UIMA-DUCC?
2. Major Changes in this Release
3. Migration from a Prior Release
4. Limitations
DUCC stands for Distributed UIMA Cluster Computing. DUCC is a cluster management system providing tooling, management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework. Core UIMA provides a generalized framework for applications that process unstructured information such as human language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources. DUCC defines a formal job model that closely maps to a standard UIMA pipeline. Around this job model DUCC provides cluster management services to automate the scale-out of UIMA pipelines over computing clusters.
Apache UIMA DUCC 3.0.0 is a maintenance release containing new features and bug fixes. What's new:
Due to a bug in uima sdk, the uima AnalysisEngineProcessException cannot be serialized as a Java object. If your analysis engine throws an exception in process(), the ducc framework will stringify it and wrap it in java RuntimeException. If you have a custom error handler plugged in into a job driver you will not be able to test for AnalysisEngineProcessException in a stack trace with a code like this: if ( error instanceof AnalysisEngineProcessException ) ...
To use OS-based login with the WebServer while running DUCC with IBM java, the minimum JDK version is Java 8 SR4 FP5 (8.0.4.5).