Hive MetaStore Upgrade HowTo ============================ This document describes how to upgrade the schema of a Derby backed Hive MetaStore instance from one release version of Hive to another release version of Hive. For example, by following the steps listed below it is possible to upgrade a Hive 0.5.0 MetaStore schema to a Hive 0.7.0 MetaStore schema. Before attempting this project we strongly recommend that you read through all of the steps in this document and familiarize yourself with the required tools. MetaStore Upgrade Steps ======================= 1) Shutdown your MetaStore instance and restrict access to the MetaStore's Derby database. It is very important that no one else accesses or modifies the contents of database while you are performing the schema upgrade. 2) Create a backup of your Derby metastore database. This will allow you to revert any changes made during the upgrade process if something goes wrong. The easiest way of accomplishing this task is by creating a copy of the directory containing your Derby database. 3) Dump your MetaStore database schema to a file using Derby's dblook utility: % dblook -d -z "APP" > my-schema-x.y.z.derby.sql Note that "APP" is Derby's default schema for user-created catalog objects. 4) The schema upgrade scripts assume that the schema you are upgrading closely matches the official schema for your particular version of Hive. The files in this directory with names like "hive-schema-x.y.z.derby.sql" contain dumps of the official schemas corresponding to each of the released versions of Hive. You can determine differences between your schema and the official schema by comparing the contents of the official dump with the schema dump you created in the previous step. Note that due to a bug in Derby the order in which the DDL statements appear is non-deterministic, so simply diffing the two dumps is unlikely to result in useable results. A simple workaround for this problem is to compare sorted versions of the two schema dumps, e.g: % sort hive-schema-x.y.z.derby.sql > hive-schema-x.y.z.derby.sql.sorted % sort my-schema-x.y.z.derby.sql > my-schema-x.y.z.derby.sql.sorted % diff hive-schema-x.y.z.derby.sql.sorted my-schema-x.y.z.derby.sql.sorted Some differences are acceptable and will not interfere with the upgrade process, but others need to be resolved manually or the upgrade scripts will fail to complete. * Missing Tables: Hive's default configuration causes the MetaStore to create schema elements only when they are needed. Some tables may be missing from your MetaStore schema if you have not created the corresponding Hive catalog objects, e.g. the PARTITIONS table will probably not exist if you have not created any table partitions in your MetaStore. You MUST create these missing tables before running the upgrade scripts. The easiest way to do this is by executing the official schema DDL script against your schema. You should expect most of the DDL statements to fail since the table/constraint/index already exist. * Reversed Column Constraint Names in the Same Table: Tables with multiple constraints may have the names of the constraints reversed. For example, the PARTITIONS table contains two foreign key constraints named PARTITIONS_FK1 and PARTITIONS_FK2 which reference SDS.SD_ID and TBLS.TBL_ID respectively. However, in your schema you may find that PARTITIONS_FK1 references TBLS.TBL_ID and PARTITIONS_FK2 references SDS.SD_ID. Either version is acceptable -- the only requirement is that these constraints actually exist. * Differences in Column/Constraint Names: Your schema may contain tables with columns named "IDX" or unique keys named "UNIQUE". If you find either of these in your schema you will need to change the names to "INTEGER_IDX" and "UNIQUE_" before running the upgrade scripts. For more background on this issue please refer to HIVE-1435. 5) You are now ready to run the schema upgrade scripts. If you are upgrading from Hive 0.5.0 to Hive 0.6.0 you need to run the upgrade-0.5.0-to-0.6.0.derby.sql script, but if you are upgrading from 0.5.0 to 0.7.0 you will need to run the 0.5.0 to 0.6.0 upgrade script followed by the 0.6.0 to 0.7.0 upgrade script. NOTE: You may need to install the Derby 'ij' utility. Look here for installation instructions: http://db.apache.org/derby/docs/10.4/getstart/ % ij ij version 10.4 ij> CONNECT 'jdbc:derby:/Users/bob/hive/metastore_db'; ij> RUN 'upgrade-0.5.0-to-0.6.0.derby.sql'; ij> RUN 'upgrade-0.6.0-to-0.7.0.derby.sql'; ij> quit; These scripts should run to completion without any errors. If you do encounter errors you need to analyze the cause and attempt to trace it back to one of the preceding steps. 6) The final step of the upgrade process is validating your freshly upgraded schema against the official schema for your particular version of Hive. This is accomplished by repeating steps (3) and (4), but this time comparing against the official version of the upgraded schema, e.g. if you upgraded the schema to Hive 0.7.0 then you will want to compare your schema dump against the contents of hive-schema-0.7.0.derby.sql