::Go back to Oozie Documentation Index::


Oozie Bundle Specification

The goal of this document is to define a new oozie abstraction called bundle system specialized in submitting and maintaining a set of coordinator applications.

Changelog

1. Bundle Overview

Bundle is a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational control.

More specififcally, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline.

2. Definitions

Kick-off-time: The time when a bundle should start and submit coordinator applications.

Bundle Application: A bundle application defines a set of coordinator applications and when to start those. Normally, bundle applications are parameterized. A bundle application is written in XML.

Bundle Job: A bundle job is an executable instance of a bundle application. A job submission is done by submitting a job configuration that resolves all parameters in the application definition.

Bundle Definition Language: The language used to describe bundle applications.

3. Expression Language for Parameterization

Bundle application definitions can be parameterized with variables.

At job submission time all the parameters are resolved into concrete values.

The parameterization of bundle definitions is done using JSP Expression Language syntax from the JSP 2.0 Specification (JSP.2.3) , allowing not only to support variables as parameters but also complex expressions.

EL expressions can be used in XML attribute values and XML text element values. They cannot be used in XML element and XML attribute names.

4. Bundle Job

4.1. Bundle Job Status

At any time, a bundle job is in one of the following status: PREP, RUNNING, SUSPENDED, PREPSUSPENDED, PAUSED, PREPPAUSED, SUCCEEDED, DONEWITHERROR, KILLED, FAILED .

4.2. Transitions of Bundle Job Status

At any time, a bundle job is in one of the following status: PREP, RUNNING, PREPSUSPENDED, SUSPENDED, PREPPAUSED, PAUSED, SUCCEEDED, DONWITHERROR, KILLED, FAILED.

Valid bundle job status transitions are:

  • PREP --> PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED
  • RUNNING --> SUSPENDED | PAUSED | SUCCEEDED | DONWITHERROR | KILLED | FAILED
  • PREPSUSPENDED --> PREP | KILLED
  • SUSPENDED --> RUNNING | KILLED
  • PREPPAUSED --> PREP | KILLED
  • PAUSED --> SUSPENDED | RUNNING | KILLED

4.3. Details of Status Transitions

When a bundle job is submitted, oozie parses the bundle job XML. Oozie then creates a record for the bundle with status PREP and returns a unique ID.

When a user requests to suspend a bundle job that is in PREP state, oozie puts the job in status PREPSUSPEND . Similarly, when pause time reaches for a bundle job with PREP status, oozie puts the job in status PREPPAUSED .

Conversely, when a user requests to resume a PREPSUSPENDED bundle job, oozie puts the job in status PREP . And when pause time is reset for a bundle job that is in PREPPAUSED state, oozie puts the job in status PREP .

There are two ways a bundle job could be started. * If kick-off-time (defined in the bundle xml) reaches. The default value is null which means starts coordinators NOW. * If user sends a start request to START the bundle.

When a bundle job starts, oozie puts the job in status RUNNING and it submits the all coordinator jobs.

When a user requests to kill a bundle job, oozie puts the job in status KILLED and it sends kill to all submitted coordinator jobs.

When a user requests to suspend a bundle job that is not in PREP status, oozie puts the job in status SUSPEND and it suspends all submitted coordinator jobs.

When pause time reaches for a bundle job that is not in PREP status, oozie puts the job in status PAUSED . When the paused time is reset, Oozie puts back the job in status RUNNING .

When all the coordinator jobs finish, oozie updates the bundle status accordingly. If all coordinators reaches to the same terminal state, bundle job status also move to the same status. For example, if all coordinators are SUCCEEDED , oozie puts the bundle job into SUCCEEDED status. However, if all coordinator jobs don't finish with the same status, oozie puts the bundle job into DONEWITHERROR .

4.3. Bundle Application Definition

A bundle definition is defined in XML by a name, controls and one or more coordinator application specifications:

  • name: The name for the bundle job.
* controls: The control specification for the bundle.
    • kick-off-time: It defines when the bundle job should start and submit the coordinator applications. This field is optional and the default is NOW that means the job should start right-a-way.
  • coordinator: Coordinator application specification. There should be at least one coordinator application in any bundle.
    • name: Name of the coordinator application. It can be used for referring this application through bundle to control such as kill, suspend, rerun.
    • app-path: Path of the coordinator application definition in hdfs. This is a mandatory element.
    • configuration: A hadoop like configuration to parameterize corresponding coordinator application. This is optional.

Syntax:

  
       <bundle-app name=[NAME]  xmlns='uri:oozie:bundle:0.1'> 
  <controls>
       <kick-off-time>[DATETIME]</kick-off-time>
  </controls>
   <coordinator name=[NAME] >
       <app-path>[COORD-APPLICATION-PATH]</app-path>
          <configuration>
            <property>
              <name>[PROPERTY-NAME]</name>
              <value>[PROPERTY-VALUE]</value>
            </property>
            ...
         </configuration>
   </coordinator>
   ...
</bundle-app>  

Examples:

A Bundle Job that maintains two coordinator applications:

   
<bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'> 
  <controls>
       <kick-off-time>${kickOffTime}</kick-off-time>
  </controls>
   <coordinator name='coordJobFromBundle1' >
       <app-path>${appPath}</app-path>
       <configuration>
         <property>
              <name>startTime1</name>
              <value>${START_TIME}</value>
          </property>
         <property>
              <name>endTime1</name>
              <value>${END_TIME}</value>
          </property>
      </configuration>
   </coordinator>
   <coordinator name='coordJobFromBundle2' >
       <app-path>${appPath2}</app-path>
       <configuration>
         <property>
              <name>startTime2</name>
              <value>${START_TIME2}</value>
          </property>
         <property>
              <name>endTime2</name>
              <value>${END_TIME2}</value>
          </property>
      </configuration>
   </coordinator>
</bundle-app>

5. User Propagation

When submitting a bundle job, the configuration must contain a user.name property. If security is enabled, Oozie must ensure that the value of the user.name property in the configuration match the user credentials present in the protocol (web services) request.

When submitting a bundle job, the configuration may contain a group.name property. If security is enabled, Oozie must ensure that the user of the request belongs to the specified group.

The specified user and group names are assigned to the created bundle job.

Oozie must propagate the specified user and group to the system executing its children (coordinator jobs).

6. Bundle Application Deployment

A bundle application consist exclusively of bundle application definition and associated coordinator application specifications. They must be installed in an HDFS directory. To submit a job for a bundle application, the full HDFS path to bundle application definition must be specified.

6.1. Organizing Bundle Applications

TBD.

7. Bundle Job Submission

When a bundle job is submitted to Oozie, the submitter must specified all the required job properties plus the HDFS path to the bundle application definition for the job.

The bundle application definition HDFS path must be specified in the 'oozie.bundle.application.path' job property.

All the bundle job properties, the HDFS path for the bundle application, the 'user.name' and 'group.name' must be submitted to the Oozie using an XML configuration file (Hadoop XML configuration file).

Example: :

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <property>
        <name>user.name</name>
        <value>joe</value>
    </property>
    <property>
        <name>oozie.bundle.application.path</name>
        <value>hdfs://foo:9000/user/joe/mybundles/hello-bundle1.xml</value>
    </property>
    ...
</configuration>

8. Bundle Rerun

Rerunning a Bundle Job

Oozie provides a way of rerunning a bundle job. The user could request to rerun a subset of coordinators within a bundle by defining a list of coordinator's names. In addition, a user could define a list of dates or ranges of dates (in UTC format) to rerun for those time windows. There is a way of asking whether to cleanup all output directories before rerun. By default, oozie will remove all output directories. Moreover, there is an option by which a user could ask to re-calculate the dynamic input directories defined by latest function in coordinators.

Rerun Arguments

$oozie job -rerun <bundle_Job_id> [-coordinator <list of coordinator name separate by comma>
[-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z]
  [-nocleanup] [-refresh]

  • The rerun option reruns a bundle job that is not in (=KILLED=, FAILED , PREP , PREPPAUSED , PREPSUSPENDED ).
  • Rerun a bundle job that is in PAUSED state will reset the paused time.
  • The option -coordinator determines the name of coordinator that will be rerun. By default all coordinators are rerun.
  • Multiple ranges can be used in -date. See the above examples.
  • The dates specified in -date must be UTC.
  • If -nocleanup is given, corresponding coordinator directories will not be removed; otherwise the 'output-event' will be deleted.
  • If -refresh is set, new dataset is re-evaluated for latest() and future() for the corresponding coordinators.
  • If -refresh is set, all dependencies will be re-checked; otherwise only missed dependencies will be checked for the corresponding coordinators.

After the command is executed the rerun bundle job will be in RUNNING status.

Refer to the Rerunning Coordinator Actions for details on rerun of coordinator job.

Appendixes

Appendix A, Oozie Bundle XML-Schema

Oozie Bundle Schema

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bundle="uri:oozie:bundle:0.1"
           elementFormDefault="qualified" targetNamespace="uri:oozie:bundle:0.1">    <xs:element name="bundle-app" type="bundle:BUNDLE-APP"/>
    <xs:simpleType name="IDENTIFIER">
        <xs:restriction base="xs:string">
            <xs:pattern value="([a-zA-Z]([\-_a-zA-Z0-9])*){1,39})"/>
        </xs:restriction>
    </xs:simpleType>
    <xs:complexType name="BUNDLE-APP">
        <xs:sequence>
            <xs:element name="controls" type="bundle:CONTROLS" minOccurs="0" maxOccurs="1"/>
            <xs:element name="coordinator" type="bundle:COORDINATOR" minOccurs="1" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/>
    </xs:complexType>
    <xs:complexType name="CONTROLS">
        <xs:sequence minOccurs="0" maxOccurs="1">
            <xs:element name="kick-off-time" type="xs:string" minOccurs="0" maxOccurs="1"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="COORDINATOR">
        <xs:sequence  minOccurs="1" maxOccurs="1">
            <xs:element name="app-path" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="configuration" type="bundle:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
        </xs:sequence>
        <xs:attribute name="name" type="bundle:IDENTIFIER" use="required"/>
        <xs:attribute name="critical" type="xs:string" use="optional"/>
    </xs:complexType>
    <xs:complexType name="CONFIGURATION">
        <xs:sequence>
            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:schema>

::Go back to Oozie Documentation Index::