::Go back to Oozie Documentation Index::
The goal of this document is to define a new oozie abstraction called bundle system specialized in submitting and maintaining a set of coordinator applications.
Bundle is a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational control.
More specififcally, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline.
Kick-off-time: The time when a bundle should start and submit coordinator applications.
Bundle Application: A bundle application defines a set of coordinator applications and when to start those. Normally, bundle applications are parameterized. A bundle application is written in XML.
Bundle Job: A bundle job is an executable instance of a bundle application. A job submission is done by submitting a job configuration that resolves all parameters in the application definition.
Bundle Definition Language: The language used to describe bundle applications.
Bundle application definitions can be parameterized with variables.
At job submission time all the parameters are resolved into concrete values.
The parameterization of bundle definitions is done using JSP Expression Language syntax from the JSP 2.0 Specification (JSP.2.3) , allowing not only to support variables as parameters but also complex expressions.
EL expressions can be used in XML attribute values and XML text element values. They cannot be used in XML element and XML attribute names.
At any time, a bundle job is in one of the following status: PREP, RUNNING, SUSPENDED, PREPSUSPENDED, PAUSED, PREPPAUSED, SUCCEEDED, DONEWITHERROR, KILLED, FAILED .
At any time, a bundle job is in one of the following status: PREP, RUNNING, PREPSUSPENDED, SUSPENDED, PREPPAUSED, PAUSED, SUCCEEDED, DONWITHERROR, KILLED, FAILED.
Valid bundle job status transitions are:
When a bundle job is submitted, oozie parses the bundle job XML. Oozie then creates a record for the bundle with status PREP and returns a unique ID.
When a user requests to suspend a bundle job that is in PREP state, oozie puts the job in status PREPSUSPEND . Similarly, when pause time reaches for a bundle job with PREP status, oozie puts the job in status PREPPAUSED .
Conversely, when a user requests to resume a PREPSUSPENDED bundle job, oozie puts the job in status PREP . And when pause time is reset for a bundle job that is in PREPPAUSED state, oozie puts the job in status PREP .
There are two ways a bundle job could be started. * If kick-off-time (defined in the bundle xml) reaches. The default value is null which means starts coordinators NOW. * If user sends a start request to START the bundle.
When a bundle job starts, oozie puts the job in status RUNNING and it submits the all coordinator jobs.
When a user requests to kill a bundle job, oozie puts the job in status KILLED and it sends kill to all submitted coordinator jobs.
When a user requests to suspend a bundle job that is not in PREP status, oozie puts the job in status SUSPEND and it suspends all submitted coordinator jobs.
When pause time reaches for a bundle job that is not in PREP status, oozie puts the job in status PAUSED . When the paused time is reset, Oozie puts back the job in status RUNNING .
When all the coordinator jobs finish, oozie updates the bundle status accordingly. If all coordinators reaches to the same terminal state, bundle job status also move to the same status. For example, if all coordinators are SUCCEEDED , oozie puts the bundle job into SUCCEEDED status. However, if all coordinator jobs don't finish with the same status, oozie puts the bundle job into DONEWITHERROR .
A bundle definition is defined in XML by a name, controls and one or more coordinator application specifications:
Syntax:
<bundle-app name=[NAME] xmlns='uri:oozie:bundle:0.1'> <controls> <kick-off-time>[DATETIME]</kick-off-time> </controls> <coordinator name=[NAME] > <app-path>[COORD-APPLICATION-PATH]</app-path> <configuration> <property> <name>[PROPERTY-NAME]</name> <value>[PROPERTY-VALUE]</value> </property> ... </configuration> </coordinator> ... </bundle-app>
Examples:
A Bundle Job that maintains two coordinator applications:
<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'> <controls> <kick-off-time>${kickOffTime}</kick-off-time> </controls> <coordinator name='coordJobFromBundle1' > <app-path>${appPath}</app-path> <configuration> <property> <name>startTime1</name> <value>${START_TIME}</value> </property> <property> <name>endTime1</name> <value>${END_TIME}</value> </property> </configuration> </coordinator> <coordinator name='coordJobFromBundle2' > <app-path>${appPath2}</app-path> <configuration> <property> <name>startTime2</name> <value>${START_TIME2}</value> </property> <property> <name>endTime2</name> <value>${END_TIME2}</value> </property> </configuration> </coordinator> </bundle-app>
When submitting a bundle job, the configuration must contain a user.name property. If security is enabled, Oozie must ensure that the value of the user.name property in the configuration match the user credentials present in the protocol (web services) request.
When submitting a bundle job, the configuration may contain the oozie.job.acl property (the group.name property has been deprecated). If authorization is enabled, this property is treated as as the ACL for the job, it can contain user and group IDs separated by commas.
The specified user and ACL are assigned to the created bundle job.
Oozie must propagate the specified user and ACL to the system executing its children jobs (coordinator jobs).
A bundle application consist exclusively of bundle application definition and associated coordinator application specifications. They must be installed in an HDFS directory. To submit a job for a bundle application, the full HDFS path to bundle application definition must be specified.
When a bundle job is submitted to Oozie, the submitter must specified all the required job properties plus the HDFS path to the bundle application definition for the job.
The bundle application definition HDFS path must be specified in the 'oozie.bundle.application.path' job property.
All the bundle job properties, the HDFS path for the bundle application, the 'user.name' and 'oozie.job.acl' must be submitted to the Oozie using an XML configuration file (Hadoop XML configuration file).
Example: :
<?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>user.name</name> <value>joe</value> </property> <property> <name>oozie.bundle.application.path</name> <value>hdfs://foo:9000/user/joe/mybundles/hello-bundle1.xml</value> </property> ... </configuration>
Oozie provides a way of rerunning a bundle job. The user could request to rerun a subset of coordinators within a bundle by defining a list of coordinator's names. In addition, a user could define a list of dates or ranges of dates (in UTC format) to rerun for those time windows. There is a way of asking whether to cleanup all output directories before rerun. By default, oozie will remove all output directories. Moreover, there is an option by which a user could ask to re-calculate the dynamic input directories defined by latest function in coordinators.
$oozie job -rerun <bundle_Job_id> [-coordinator <list of coordinator name separate by comma> [-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z] [-nocleanup] [-refresh]
After the command is executed the rerun bundle job will be in RUNNING status.
Refer to the Rerunning Coordinator Actions for details on rerun of coordinator job.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bundle="uri:oozie:bundle:0.1" elementFormDefault="qualified" targetNamespace="uri:oozie:bundle:0.1"> <xs:element name="bundle-app" type="bundle:BUNDLE-APP"/> <xs:simpleType name="IDENTIFIER"> <xs:restriction base="xs:string"> <xs:pattern value="([a-zA-Z]([\-_a-zA-Z0-9])*){1,39})"/> </xs:restriction> </xs:simpleType> <xs:complexType name="BUNDLE-APP"> <xs:sequence> <xs:element name="controls" type="bundle:CONTROLS" minOccurs="0" maxOccurs="1"/> <xs:element name="coordinator" type="bundle:COORDINATOR" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/> </xs:complexType> <xs:complexType name="CONTROLS"> <xs:sequence minOccurs="0" maxOccurs="1"> <xs:element name="kick-off-time" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:complexType name="COORDINATOR"> <xs:sequence minOccurs="1" maxOccurs="1"> <xs:element name="app-path" type="xs:string" minOccurs="1" maxOccurs="1"/> <xs:element name="configuration" type="bundle:CONFIGURATION" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/> <xs:attribute name="critical" type="xs:string" use="optional"/> </xs:complexType> <xs:complexType name="CONFIGURATION"> <xs:sequence> <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema>