PREHOOK: query: -- When Correlation Optimizer is turned off, 5 MR jobs will be generated. -- When Correlation Optimizer is turned on, the subquery tmp will be evalauted -- in a single MR job (including the subquery b, the subquery d, and b join d). -- At the reduce side of the MR job evaluating tmp, two operation paths -- (for subquery b and d) have different depths. The path starting from subquery b -- is JOIN->GBY->JOIN, which has a depth of 3. While, the path starting from subquery d -- is JOIN->JOIN. We should be able to handle this case. EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp PREHOOK: type: QUERY POSTHOOK: query: -- When Correlation Optimizer is turned off, 5 MR jobs will be generated. -- When Correlation Optimizer is turned on, the subquery tmp will be evalauted -- in a single MR job (including the subquery b, the subquery d, and b join d). -- At the reduce side of the MR job evaluating tmp, two operation paths -- (for subquery b and d) have different depths. The path starting from subquery b -- is JOIN->GBY->JOIN, which has a depth of 3. While, the path starting from subquery d -- is JOIN->JOIN. We should be able to handle this case. EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp POSTHOOK: type: QUERY ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (TOK_FUNCTION count 1) cnt)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL x) key)))) b) (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) value))))) d) (= (. (TOK_TABLE_OR_COL b) key) (. (TOK_TABLE_OR_COL d) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) key) key) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) cnt) cnt) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) value) value)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) key)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) cnt)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) value))))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-3 depends on stages: Stage-2, Stage-6 Stage-4 depends on stages: Stage-3 Stage-6 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: tmp:b:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 0 value expressions: expr: key type: string tmp:b:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 1 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 handleSkewJoin: false outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: _col0 type: string mode: hash outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: -1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: $INTNAME Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col0 type: string expr: _col1 type: bigint $INTNAME1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: string Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col1} handleSkewJoin: false outputColumnNames: _col0, _col1, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint expr: _col3 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-4 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) expr: sum(VALUE._col1) expr: sum(VALUE._col2) bucketGroup: false mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-6 Map Reduce Alias -> Map Operator Tree: tmp:d:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 0 value expressions: expr: key type: string expr: value type: string tmp:d:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 1 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 handleSkewJoin: false outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-0 Fetch Operator limit: -1 PREHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp PREHOOK: type: QUERY PREHOOK: Input: default@src PREHOOK: Input: default@src1 #### A masked pattern was here #### POSTHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp POSTHOOK: type: QUERY POSTHOOK: Input: default@src POSTHOOK: Input: default@src1 #### A masked pattern was here #### 1711763 107 3531902962 PREHOOK: query: EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp PREHOOK: type: QUERY POSTHOOK: query: EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp POSTHOOK: type: QUERY ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (TOK_FUNCTION count 1) cnt)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL x) key)))) b) (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) value))))) d) (= (. (TOK_TABLE_OR_COL b) key) (. (TOK_TABLE_OR_COL d) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) key) key) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) cnt) cnt) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) value) value)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) key)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) cnt)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) value))))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: tmp:b:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 0 value expressions: expr: key type: string tmp:b:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 1 tmp:d:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 2 value expressions: expr: key type: string expr: value type: string tmp:d:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 3 Reduce Operator Tree: Demux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 handleSkewJoin: false outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Mux Operator Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: _col0 type: string mode: complete outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col1} handleSkewJoin: false outputColumnNames: _col0, _col1, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint expr: _col3 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 handleSkewJoin: false outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string outputColumnNames: _col0, _col1 Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col1} handleSkewJoin: false outputColumnNames: _col0, _col1, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint expr: _col3 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) expr: sum(VALUE._col1) expr: sum(VALUE._col2) bucketGroup: false mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 PREHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp PREHOOK: type: QUERY PREHOOK: Input: default@src PREHOOK: Input: default@src1 #### A masked pattern was here #### POSTHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp POSTHOOK: type: QUERY POSTHOOK: Input: default@src POSTHOOK: Input: default@src1 #### A masked pattern was here #### 1711763 107 3531902962 PREHOOK: query: -- Enable hive.auto.convert.join. EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp PREHOOK: type: QUERY POSTHOOK: query: -- Enable hive.auto.convert.join. EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp POSTHOOK: type: QUERY ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (TOK_FUNCTION count 1) cnt)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL x) key)))) b) (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) value))))) d) (= (. (TOK_TABLE_OR_COL b) key) (. (TOK_TABLE_OR_COL d) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) key) key) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) cnt) cnt) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) value) value)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) key)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) cnt)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) value))))))) STAGE DEPENDENCIES: Stage-9 is a root stage Stage-2 depends on stages: Stage-9 Stage-3 depends on stages: Stage-2 Stage-0 is a root stage STAGE PLANS: Stage: Stage-9 Map Reduce Local Work Alias -> Map Local Tables: tmp:b:x Fetch Operator limit: -1 tmp:d:x Fetch Operator limit: -1 Alias -> Map Local Operator Tree: tmp:b:x TableScan alias: x HashTable Sink Operator condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 tmp:d:x TableScan alias: x HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: tmp:b:y TableScan alias: y Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: _col0 type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: bigint tmp:d:y TableScan alias: y Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: string Local Work: Map Reduce Local Work Reduce Operator Tree: Demux Operator Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col1} handleSkewJoin: false outputColumnNames: _col0, _col1, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint expr: _col3 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col1} handleSkewJoin: false outputColumnNames: _col0, _col1, _col3 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint expr: _col3 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) expr: sum(VALUE._col1) expr: sum(VALUE._col2) bucketGroup: false mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 PREHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp PREHOOK: type: QUERY PREHOOK: Input: default@src PREHOOK: Input: default@src1 #### A masked pattern was here #### POSTHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT b.key AS key, b.cnt AS cnt, d.value AS value FROM (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) b JOIN (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) d ON b.key = d.key) tmp POSTHOOK: type: QUERY POSTHOOK: Input: default@src POSTHOOK: Input: default@src1 #### A masked pattern was here #### 1711763 107 3531902962 PREHOOK: query: EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp PREHOOK: type: QUERY POSTHOOK: query: EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp POSTHOOK: type: QUERY ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) value))))) b) (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (TOK_FUNCTION count 1) cnt)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL x) key)))) d) (= (. (TOK_TABLE_OR_COL b) key) (. (TOK_TABLE_OR_COL d) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) key) key) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) cnt) cnt) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) value) value)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) key)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) cnt)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) value))))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1, Stage-6 Stage-3 depends on stages: Stage-2 Stage-5 is a root stage Stage-6 depends on stages: Stage-5 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: tmp:b:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 0 value expressions: expr: key type: string expr: value type: string tmp:b:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 1 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 handleSkewJoin: false outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: $INTNAME Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: string $INTNAME1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col0 type: string expr: _col1 type: bigint Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col2, _col3 Select Operator expressions: expr: _col2 type: string expr: _col3 type: bigint expr: _col1 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) expr: sum(VALUE._col1) expr: sum(VALUE._col2) bucketGroup: false mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-5 Map Reduce Alias -> Map Operator Tree: tmp:d:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 0 value expressions: expr: key type: string tmp:d:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 1 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 handleSkewJoin: false outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: _col0 type: string mode: hash outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-6 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: -1 value expressions: expr: _col1 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-0 Fetch Operator limit: -1 PREHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp PREHOOK: type: QUERY PREHOOK: Input: default@src PREHOOK: Input: default@src1 #### A masked pattern was here #### POSTHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp POSTHOOK: type: QUERY POSTHOOK: Input: default@src POSTHOOK: Input: default@src1 #### A masked pattern was here #### 1711763 107 3531902962 PREHOOK: query: EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp PREHOOK: type: QUERY POSTHOOK: query: EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp POSTHOOK: type: QUERY ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) value))))) b) (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (TOK_FUNCTION count 1) cnt)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL x) key)))) d) (= (. (TOK_TABLE_OR_COL b) key) (. (TOK_TABLE_OR_COL d) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) key) key) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) cnt) cnt) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) value) value)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) key)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) cnt)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) value))))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: tmp:b:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 0 value expressions: expr: key type: string expr: value type: string tmp:b:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 1 tmp:d:x TableScan alias: x Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 2 value expressions: expr: key type: string tmp:d:y TableScan alias: y Reduce Output Operator key expressions: expr: key type: string sort order: + Map-reduce partition columns: expr: key type: string tag: 3 Reduce Operator Tree: Demux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 handleSkewJoin: false outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string outputColumnNames: _col0, _col1 Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col2, _col3 Select Operator expressions: expr: _col2 type: string expr: _col3 type: bigint expr: _col1 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 handleSkewJoin: false outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Mux Operator Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: _col0 type: string mode: complete outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col2, _col3 Select Operator expressions: expr: _col2 type: string expr: _col3 type: bigint expr: _col1 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) expr: sum(VALUE._col1) expr: sum(VALUE._col2) bucketGroup: false mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 PREHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp PREHOOK: type: QUERY PREHOOK: Input: default@src PREHOOK: Input: default@src1 #### A masked pattern was here #### POSTHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp POSTHOOK: type: QUERY POSTHOOK: Input: default@src POSTHOOK: Input: default@src1 #### A masked pattern was here #### 1711763 107 3531902962 PREHOOK: query: -- Enable hive.auto.convert.join. EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp PREHOOK: type: QUERY POSTHOOK: query: -- Enable hive.auto.convert.join. EXPLAIN SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp POSTHOOK: type: QUERY ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) value))))) b) (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME src1) x) (TOK_TABREF (TOK_TABNAME src) y) (= (. (TOK_TABLE_OR_COL x) key) (. (TOK_TABLE_OR_COL y) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL x) key)) (TOK_SELEXPR (TOK_FUNCTION count 1) cnt)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL x) key)))) d) (= (. (TOK_TABLE_OR_COL b) key) (. (TOK_TABLE_OR_COL d) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) key) key) (TOK_SELEXPR (. (TOK_TABLE_OR_COL d) cnt) cnt) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) value) value)))) tmp)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) key)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) cnt)))) (TOK_SELEXPR (TOK_FUNCTION SUM (TOK_FUNCTION HASH (. (TOK_TABLE_OR_COL tmp) value))))))) STAGE DEPENDENCIES: Stage-9 is a root stage Stage-2 depends on stages: Stage-9 Stage-3 depends on stages: Stage-2 Stage-0 is a root stage STAGE PLANS: Stage: Stage-9 Map Reduce Local Work Alias -> Map Local Tables: tmp:b:x Fetch Operator limit: -1 tmp:d:x Fetch Operator limit: -1 Alias -> Map Local Operator Tree: tmp:b:x TableScan alias: x HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 tmp:d:x TableScan alias: x HashTable Sink Operator condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: tmp:b:y TableScan alias: y Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: string outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 0 value expressions: expr: _col1 type: string tmp:d:y TableScan alias: y Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0 Position of Big Table: 1 Select Operator expressions: expr: _col0 type: string outputColumnNames: _col0 Group By Operator aggregations: expr: count(1) bucketGroup: false keys: expr: _col0 type: string mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator key expressions: expr: _col0 type: string sort order: + Map-reduce partition columns: expr: _col0 type: string tag: 1 value expressions: expr: _col1 type: bigint Local Work: Map Reduce Local Work Reduce Operator Tree: Demux Operator Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col2, _col3 Select Operator expressions: expr: _col2 type: string expr: _col3 type: bigint expr: _col1 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false keys: expr: KEY._col0 type: string mode: mergepartial outputColumnNames: _col0, _col1 Select Operator expressions: expr: _col0 type: string expr: _col1 type: bigint outputColumnNames: _col0, _col1 Mux Operator Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col1} 1 {VALUE._col0} {VALUE._col1} handleSkewJoin: false outputColumnNames: _col1, _col2, _col3 Select Operator expressions: expr: _col2 type: string expr: _col3 type: bigint expr: _col1 type: string outputColumnNames: _col0, _col1, _col2 Group By Operator aggregations: expr: sum(hash(_col0)) expr: sum(hash(_col1)) expr: sum(hash(_col2)) bucketGroup: false mode: hash outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: #### A masked pattern was here #### Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: sum(VALUE._col0) expr: sum(VALUE._col1) expr: sum(VALUE._col2) bucketGroup: false mode: mergepartial outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col0 type: bigint expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 PREHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp PREHOOK: type: QUERY PREHOOK: Input: default@src PREHOOK: Input: default@src1 #### A masked pattern was here #### POSTHOOK: query: SELECT SUM(HASH(tmp.key)), SUM(HASH(tmp.cnt)), SUM(HASH(tmp.value)) FROM (SELECT d.key AS key, d.cnt AS cnt, b.value as value FROM (SELECT x.key, x.value FROM src1 x JOIN src y ON (x.key = y.key)) b JOIN (SELECT x.key, count(1) AS cnt FROM src1 x JOIN src y ON (x.key = y.key) group by x.key) d ON b.key = d.key) tmp POSTHOOK: type: QUERY POSTHOOK: Input: default@src POSTHOOK: Input: default@src1 #### A masked pattern was here #### 1711763 107 3531902962