PREHOOK: query: CREATE TABLE test_table (key INT, value STRING) PARTITIONED BY (ds STRING, hr STRING) PREHOOK: type: CREATETABLE PREHOOK: Output: database:default PREHOOK: Output: default@test_table POSTHOOK: query: CREATE TABLE test_table (key INT, value STRING) PARTITIONED BY (ds STRING, hr STRING) POSTHOOK: type: CREATETABLE POSTHOOK: Output: database:default POSTHOOK: Output: default@test_table PREHOOK: query: -- Tests dynamic partitions where bucketing/sorting can be inferred, but not all reducers write -- all partitions. The subquery produces rows as follows -- key = 0: -- 0, , 0 -- key = 1: -- 0, , 1 -- key = 2: -- 1, , 0 -- This means that by distributing by the first column into two reducers, and using the third -- columns as a dynamic partition, the dynamic partition for 0 will get written in both reducers -- and the partition for 1 will get written in one reducer. So hr=0 should be bucketed by key -- and hr=1 should not. EXPLAIN INSERT OVERWRITE TABLE test_table PARTITION (ds = '2008-04-08', hr) SELECT key2, value, cast(hr as int) FROM (SELECT if ((key % 3) < 2, 0, 1) as key2, value, (key % 2) as hr FROM srcpart WHERE ds = '2008-04-08') a DISTRIBUTE BY key2 PREHOOK: type: QUERY POSTHOOK: query: -- Tests dynamic partitions where bucketing/sorting can be inferred, but not all reducers write -- all partitions. The subquery produces rows as follows -- key = 0: -- 0, , 0 -- key = 1: -- 0, , 1 -- key = 2: -- 1, , 0 -- This means that by distributing by the first column into two reducers, and using the third -- columns as a dynamic partition, the dynamic partition for 0 will get written in both reducers -- and the partition for 1 will get written in one reducer. So hr=0 should be bucketed by key -- and hr=1 should not. EXPLAIN INSERT OVERWRITE TABLE test_table PARTITION (ds = '2008-04-08', hr) SELECT key2, value, cast(hr as int) FROM (SELECT if ((key % 3) < 2, 0, 1) as key2, value, (key % 2) as hr FROM srcpart WHERE ds = '2008-04-08') a DISTRIBUTE BY key2 POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: if(((key % 3) < 2), 0, 1) (type: int), value (type: string), UDFToInteger((key % 2)) (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: int) Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: int), VALUE._col1 (type: string), VALUE._col2 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.test_table Stage: Stage-0 Move Operator tables: partition: ds 2008-04-08 hr replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.test_table Stage: Stage-2 Stats-Aggr Operator PREHOOK: query: INSERT OVERWRITE TABLE test_table PARTITION (ds = '2008-04-08', hr) SELECT key2, value, cast(hr as int) FROM (SELECT if ((key % 3) < 2, 0, 1) as key2, value, (key % 3 % 2) as hr FROM srcpart WHERE ds = '2008-04-08') a DISTRIBUTE BY key2 PREHOOK: type: QUERY PREHOOK: Input: default@srcpart PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=11 PREHOOK: Input: default@srcpart@ds=2008-04-08/hr=12 PREHOOK: Output: default@test_table@ds=2008-04-08 POSTHOOK: query: INSERT OVERWRITE TABLE test_table PARTITION (ds = '2008-04-08', hr) SELECT key2, value, cast(hr as int) FROM (SELECT if ((key % 3) < 2, 0, 1) as key2, value, (key % 3 % 2) as hr FROM srcpart WHERE ds = '2008-04-08') a DISTRIBUTE BY key2 POSTHOOK: type: QUERY POSTHOOK: Input: default@srcpart POSTHOOK: Input: default@srcpart@ds=2008-04-08/hr=11 POSTHOOK: Input: default@srcpart@ds=2008-04-08/hr=12 POSTHOOK: Output: default@test_table@ds=2008-04-08/hr=0 POSTHOOK: Output: default@test_table@ds=2008-04-08/hr=1 POSTHOOK: Lineage: test_table PARTITION(ds=2008-04-08,hr=0).key EXPRESSION [(srcpart)srcpart.FieldSchema(name:key, type:string, comment:default), ] POSTHOOK: Lineage: test_table PARTITION(ds=2008-04-08,hr=0).value SIMPLE [(srcpart)srcpart.FieldSchema(name:value, type:string, comment:default), ] POSTHOOK: Lineage: test_table PARTITION(ds=2008-04-08,hr=1).key EXPRESSION [(srcpart)srcpart.FieldSchema(name:key, type:string, comment:default), ] POSTHOOK: Lineage: test_table PARTITION(ds=2008-04-08,hr=1).value SIMPLE [(srcpart)srcpart.FieldSchema(name:value, type:string, comment:default), ] PREHOOK: query: DESCRIBE FORMATTED test_table PARTITION (ds='2008-04-08', hr='0') PREHOOK: type: DESCTABLE PREHOOK: Input: default@test_table POSTHOOK: query: DESCRIBE FORMATTED test_table PARTITION (ds='2008-04-08', hr='0') POSTHOOK: type: DESCTABLE POSTHOOK: Input: default@test_table # col_name data_type comment key int value string # Partition Information # col_name data_type comment ds string hr string # Detailed Partition Information Partition Value: [2008-04-08, 0] Database: default Table: test_table #### A masked pattern was here #### Protect Mode: None #### A masked pattern was here #### Partition Parameters: COLUMN_STATS_ACCURATE true numFiles 2 numRows 0 rawDataSize 0 totalSize 6558 #### A masked pattern was here #### # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: 2 Bucket Columns: [key] Sort Columns: [] Storage Desc Params: serialization.format 1 PREHOOK: query: DESCRIBE FORMATTED test_table PARTITION (ds='2008-04-08', hr='1') PREHOOK: type: DESCTABLE PREHOOK: Input: default@test_table POSTHOOK: query: DESCRIBE FORMATTED test_table PARTITION (ds='2008-04-08', hr='1') POSTHOOK: type: DESCTABLE POSTHOOK: Input: default@test_table # col_name data_type comment key int value string # Partition Information # col_name data_type comment ds string hr string # Detailed Partition Information Partition Value: [2008-04-08, 1] Database: default Table: test_table #### A masked pattern was here #### Protect Mode: None #### A masked pattern was here #### Partition Parameters: COLUMN_STATS_ACCURATE true numFiles 1 numRows 0 rawDataSize 0 totalSize 3254 #### A masked pattern was here #### # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1