Apache Slider App Packages Rolling Upgrade/Reconfiguration

Introduction

From version 0.80.0-incubating onwards, Slider supports application package rolling upgrade. Application package owners can now upgrade to a newer version of their application binary and/or configuration, without any downtime.

By saying no downtime, it is primarily the responsibility of the application owners to orchestrate the upgrade. It needs to be done in a planned fashion, such that the application appears to be running, as per the end-users are concerned. Hence, if Apache HBase is undergoing an upgrade from version 0.98.4 to 1.1.0, then sufficient regionservers should be running at all times, which can handle the load at the time of the upgrade. Application owners also need to ensure that their application is compatible in a mixed version mode. So, while the regionservers are getting upgraded, the master will still be the older version. Hence 1.1.0 regionserver should be backward compatible with 0.98.4 master.

Currently, Ambari Slider View supports automated deployment and management of application packages. Automated application package upgrade will be supported soon.

Phases of Upgrade/Downgrade

YARN core (libraries and configurations) upgrade/downgrade

Running Slider apps will continue to run, with no downtime

Slider client upgrade/downgrade

Does not affect running Slider apps at all. New version of client can co-exist with older versions of client.

Slider Application Master upgrade/downgrade of running applications

Applications started prior to the start of YARN core upgrade/downgrade, will continue to run with the older versions of Apache Slider core and Apache Hadoop libraries. There is no support for rolling upgrade of Slider AM. To upgrade running Slider AMs, the application needs to be stopped and restarted with the new version of the client.

Applications deployed by Slider (binaries and configurations) upgrade/downgrade

This is what this document is primarily about.

Rolling Upgrade of Applications Deployed by Slider

A run-book style list of atomic steps exposed by Slider. These steps will be automated by Apache Ambari in a future release. It can be easily orchestrated by a shell script or executed manually in the correct order.

Note, the following YARN property should be specified to a reasonably high value (say 5 or 10) before performing application upgrade. The default value of this property in a YARN cluster is set to 2, which is not recommended for long running applications (specifically when performing upgrade).

<property>
  <name>yarn.resourcemanager.am.max-attempts</name>
  <value>10</value>
</property>

An application owner brings a new package and creates a new instance

Upload a new application package slider-hbase-app-package_v1.0.zip

slider package --install --name MyHBase_Facebook --version 1.0 --package ~/slider-hbase-app-package_v1.0.zip

Create an application MyHBase_Facebook_Finance

slider create MyHBase_Facebook_Finance --template ~/myHBase_appConfig_v1.0.json --resources ~/myHBase_resources_v1.0.json

Note, that myHBase_appConfig_v1.0.json should have an app_version property, which is newly exposed for upgrade support -

"site.global.app_version": "1.0",

Please refer to the following appConfig-default.json as an example.

Application MyHBase_Facebook_Finance goes through its usual lifecycle, say flex regionservers up

slider flex MyHBase_Facebook_Finance --component HBASE_REGIONSERVER 5

The app owner can list all containers of an app instance with filtering options

slider list MyHBase_Facebook_Finance --containers

slider list MyHBase_Facebook_Finance --containers --components HBASE_MASTER

slider list MyHBase_Facebook_Finance --containers --components HBASE_MASTER HBASE_REGIONSERVER --version 1.0

slider list MyHBase_Facebook_Finance --containers --version 1.0

At some point, the app owner has a newer version of the package and needs to upgrade

Upload the newer version 2.0 of the app package

slider package --install --name MyHBase_Facebook --version 2.0 --package ~/slider-hbase-app-package_v2.0.zip

The following command upgrades the internal state of Slider App Master with the new config and resource specifications. After this, upgrade command needs to be issued against all containers, to upgrade them to the new app version e.g. v2.0 in this case. Note, it is recommended to not issue the flex command until all containers are upgraded, although Slider does not explicitly block it.

slider upgrade MyHBase_Facebook_Finance --template ~/myHBase_appConfig_v2.0.json --resources ~/myHBase_resources_v2.0.json

Note, that myHBase_appConfig_v2.0.json should have an app_version property, similar to what the v1.0 of app config had -

"site.global.app_version": "2.0",

At this point the upgrade orchestrator (Ambari or a script) will rely on the slider list command with --containers option (with additional filter options --components and/or --version) to find all the containers that are running for the application. The orchestrator needs to decide how it wants to span/schedule the upgrade commands of every single container. This will primarily depend on the type of the application. For e.g. say, HBase might want to call upgrade on the Master container(s) first before calling upgrade on any of the RegionServer containers. Storm might want to provide a 5 min delay between the upgrade of the Nimbus and Supervisor container(s).

Upgrade all containers with ids id1, id2, .. idn. It is possible that all these n containers can actually be down at the same time. If this is not desired, then issue n upgrade commands with a single container id at a time.

slider upgrade MyHBase_Facebook_Finance --containers id1 id2 .. idn

e.g. slider upgrade MyHBase_Facebook_Finance --containers container_e03_1427412101162_0001_01_000002 container_e03_1427412101162_0001_01_000003

Upgrade all containers of all roles role1, role2, .. rolen. Again, all containers of all the n roles can be down at the same time. If this is not desired, use the --containers option with one container at a time.

slider upgrade MyHBase_Facebook_Finance --components role1 role2 .. rolen

e.g. slider upgrade MyHBase_Facebook_Finance --components HBASE_MASTER HBASE_REGIONSERVER

When do we know that upgrade is complete?

The upgrade command needs to be executed against all containers. Upgrade can be considered complete only if the following command returns no containers.

slider list MyHBase_Facebook_Finance --containers --version 1.0

The app owner can also call slider list MyHBase_Facebook_Finance --containers to explicitly check that all expected containers are running and with the desired new version (e.g. 2.0 in this case).

If any issues are encountered in Slider and/or the application during upgrade, the application can be downgraded to its original version.

Downgrade Applications Deployed and Upgraded by Slider

A Slider deployed application might be required to be downgraded to its original version for any reason - whether the upgrade was completed in entirety or was in-flight. There are no new commands exposed to perform downgrade. It basically involves calling the upgrade command again with the older version (e.g. version 1.0 in this case) of the application config and resource specifications. It needs to be followed by upgrade commands issued against all containers which were upgraded to the newer version.

This command although called upgrade, actually downgrades the internal state of Slider App Master with the old v1.0 config and resource specifications

slider upgrade MyHBase_Facebook_Finance --template ~/myHBase_appConfig_v1.0.json --resources ~/myHBase_resources_v1.0.json

Gather the list of all containers returned by the list command -

slider list MyHBase_Facebook_Finance --containers --version 2.0

And issue upgrade command on all of them (in desired orchestrated fashion) -

slider upgrade MyHBase_Facebook_Finance --containers id1 id2 .. idn

Downgrade is done when the following command returns no containers.

slider list MyHBase_Facebook_Finance --containers --version 2.0

Pre and post upgrade hooks

Pre-upgrade hook (optional)

The pre-upgrade steps, if provided, will allow applications to execute simple housekeeping tasks

before Slider actually calls stop operation in an upgrade scenario (specifically if they need to be performed in every single container and 1000s of them are running). An example could be to send a message to a queue that the current instance of memcached is going down so that the load balancer rules can be dynamically updated. Performing long-running tasks or tasks that needs to be executed by only 1 instance of a component (n instances of which are running) should be performed manually before starting the upgrade process. The pre-upgrade hook is not a good candidate for such operations. Additional parameters might have to be exposed, like timeout, which can be used to wait at most n seconds (say) after which Slider will call the application stop hook even if the pre-upgrade operation is not completed.

Use `app-packages/hbase/package/scripts/hbase_master.py` as a sample for defining the

pre-upgrade hook. Note, the pre-upgrade hook will be triggered only if the currently running application has been created using Slider verion 0.80.0-incubating or later and the scripts in the package has the pre-upgrade hook defined.

Post-upgrade hook (optional) - not yet supported

This allows applications to perform simple housekeeping tasks prior to calling start on the new

version of the application component. This is helpful only if such tasks are required to be performed in every single container and 1000s of them are running. This hook will be triggered only in the upgrade scenario. It will not be called on new containers created using flex up, in non-upgrade scenarios. This makes triggering of post-upgrade hook a little tricky and hence is not supported today. This will be looked into, in future releases.

Upgrade support of simple apps with no packages

  • Upgrade of such simple apps does not involve binaries
  • Upgrade of such simple apps basically means re-configuration only
  • The following can be done to perform the reconfiguration (or upgrade if you prefer to call so)
    • Download the app package that was internally created by Slider from HDFS. This is the path defined in application.def property in the appConfig.json file. Slider client command-line support is not available to perform this download (yet). So, refer to documentation or get Slider/Yarn admin support if required.
    • Unzip the application package
    • Make necessary changes to metainfo.json and repackage the app
    • Make necessary updates to appConfig.json and resources.json
    • Re-package the application package (this is the newer version of your package, say v2.0)
    • Follow the upgrade steps provided in this document to upgrade the simple app to v2.0

Assumptions/Recommendations

  • Let’s take the scenario when the upgrade command has been called (with new app config and resources) and the orchestrator is in the middle of calling upgrade command on all containers in a certain order/schedule. At this point, it is important to understand that if a container (on which the upgrade command was not called yet) fails, then its replacement container will come up with the newer version of the application.
  • It is recommended to not call flex (to increase or decrease the no of containers assigned to a specific role) after the upgrade spec command has been issued and until upgrade command is issued on all currently running containers (as per current resource specification). Basically avoid calling flex until slider list --containers --version <old_ver> returns no more containers. Note, flex command is not explicitly blocked by Slider, hence unexpected behavior of the application might occur, if called.