Apache Hadoop 2.4.1 - HDFS Rolling Upgrade

Introduction

HDFS rolling upgrade allows upgrading individual HDFS daemons. For examples, the datanodes can be upgraded independent of the namenodes. A namenode can be upgraded independent of the other namenodes. The namenodes can be upgraded independent of datanods and journal nodes.

Upgrade

In Hadoop v2, HDFS supports highly-available (HA) namenode services and wire compatibility. These two capabilities make it feasible to upgrade HDFS without incurring HDFS downtime. In order to upgrade a HDFS cluster without downtime, the cluster must be setup with HA.

Upgrade without Downtime

In a HA cluster, there are two or more NameNodes (NNs), many DataNodes (DNs), a few JournalNodes (JNs) and a few ZooKeeperNodes (ZKNs). JNs is relatively stable and does not require upgrade when upgrading HDFS in most of the cases. In the rolling upgrade procedure described here, only NNs and DNs are considered but JNs and ZKNs are not. Upgrading JNs and ZKNs may incur cluster downtime.

Upgrading Non-Federated Clusters

Suppose there are two namenodes NN1 and NN2, where NN1 and NN2 are respectively in active and standby states. The following are the steps for upgrading a HA cluster:

Prepare Rolling Upgrade
1. Run "hdfs dfsadmin -rollingUpgrade prepare" to create a fsimage for rollback.
2. Run "hdfs dfsadmin -rollingUpgrade query" to check the status of the rollback image. Wait and re-run the command until the "Proceed with rolling upgrade" message is shown.
Upgrade Active and Standby NNs
1. Shutdown and upgrade NN2.
2. Start NN2 as standby with the "-rollingUpgrade started" option.
3. Failover from NN1 to NN2 so that NN2 becomes active and NN1 becomes standby.
4. Shutdown and upgrade NN1.
5. Start NN1 as standby with the "-rollingUpgrade started" option.
Upgrade DNs
1. Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
2. Repeat the above steps until all datanodes in the cluster are upgraded.
Finalize Rolling Upgrade
- Run "hdfs dfsadmin -rollingUpgrade finalize" to finalize the rolling upgrade.

Upgrading Federated Clusters

In a federated cluster, there are multiple namespaces and a pair of active and standby NNs for each namespace. The procedure for upgrading a federated cluster is similar to upgrading a non-federated cluster except that Step 1 and Step 4 are performed on each namespace and Step 2 is performed on each pair of active and standby NNs, i.e.

Prepare Rolling Upgrade for Each Namespace
Upgrade Active and Standby NN pairs for Each Namespace
Upgrade DNs
Finalize Rolling Upgrade for Each Namespace

Upgrade with Downtime

For non-HA clusters, it is impossible to upgrade HDFS without downtime since it requires restarting the namenodes. However, datanodes can still be upgraded in a rolling manner.

Upgrading Non-HA Clusters

In a non-HA cluster, there are a NameNode (NN), a SecondaryNameNode (SNN) and many DataNodes (DNs). The procedure for upgrading a non-HA cluster is similar to upgrading a HA cluster except that Step 2 "Upgrade Active and Standby NNs" is changed to below:

Upgrade NN and SNN
1. Shutdown SNN
2. Shutdown and upgrade NN.
3. Start NN with the "-rollingUpgrade started" option.
4. Upgrade and restart SNN

Downgrade and Rollback

When the upgraded release is undesirable or, in some unlikely case, the upgrade fails (due to bugs in the newer release), administrators may choose to downgrade HDFS back to the pre-upgrade release, or rollback HDFS to the pre-upgrade release and the pre-upgrade state. Both downgrade and rollback require cluster downtime and are not done in a rolling fashion.

Note that downgrade and rollback are possible only after a rolling upgrade is started and before the upgrade is terminated. An upgrade can be terminated by either finalize, downgrade or rollback. Therefore, it may not be possible to perform rollback after finalize or downgrade, or to perform downgrade after finalize.

Downgrade

Downgrade restores the software back to the pre-upgrade release and preserves the user data. Suppose time T is the rolling upgrade start time and the upgrade is terminated by downgrade. Then, the files created before or after T remain available in HDFS. The files deleted before or after T remain deleted in HDFS.

A newer release is downgradable to the pre-upgrade release only if both the namenode layout version and the datenode layout version are not changed between these two releases. Below are the steps for downgrade:

Downgrade HDFS
1. Shutdown all NNs and DNs.
2. Restore the pre-upgrade release in all machines.
3. Start NNs with the "-rollingUpgrade downgrade" option.
4. Start DNs normally.

Rollback

Rollback restores the software back to the pre-upgrade release but also reverts the user data back to the pre-upgrade state. Suppose time T is the rolling upgrade start time and the upgrade is terminated by rollback. The files created before T remain available in HDFS but the files created after T become unavailable. The files deleted before T remain deleted in HDFS but the files deleted after T are restored.

Rollback from a newer release to the pre-upgrade release is always supported. Below are the steps for rollback:

Rollback HDFS
1. Shutdown all NNs and DNs.
2. Restore the pre-upgrade release in all machines.
3. Start NNs with the "-rollingUpgrade rollback" option.
4. Start DNs normally.

Commands and Startup Options for Rolling Upgrade

DFSAdmin Commands

`dfsadmin -rollingUpgrade`

hdfs dfsadmin -rollingUpgrade <query|start|finalize>

Execute a rolling upgrade action.

Options:

query Query the current rolling upgrade status.

prepare Prepare a new rolling upgrade.

finalize Finalize the current rolling upgrade.

`dfsadmin -getDatanodeInfo`

hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>

Get the information about the given datanode. This command can be used for checking if a datanode is alive like the Unix ping command.

`dfsadmin -shutdownDatanode`

hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> [upgrade]

Submit a shutdown request for the given datanode. If the optional upgrade argument is specified, clients accessing the datanode will be advised to wait for it to restart and the fast start-up mode will be enabled. When the restart does not happen in time, clients will timeout and ignore the datanode. In such case, the fast start-up mode will also be disabled.

Note that the command does not wait for the datanode shutdown to complete. The "dfsadmin -getDatanodeInfo" command can be used for checking if the datanode shutdown is complete.

NameNode Startup Options

`namenode -rollingUpgrade`

hdfs namenode -rollingUpgrade <downgrade|rollback|started>

When a rolling upgrade is in progress, the -rollingUpgrade namenode startup option is used to specify various rolling upgrade options.

Options:

`downgrade`	Restores the namenode back to the pre-upgrade release and preserves the user data.
`rollback`	Restores the namenode back to the pre-upgrade release but also reverts the user data back to the pre-upgrade state.
`started`	Specifies a rolling upgrade already started so that the namenode should allow image directories with different layout versions during startup.

General

Common

HDFS

MapReduce

YARN

YARN REST APIs

Auth

Reference

Configuration

HDFS Rolling Upgrade