public class HiveAggregateIncrementalRewritingRule
extends org.apache.calcite.plan.RelOptRule
This rule will perform a rewriting to prepare the plan for incremental
view maintenance in case there exist aggregation operator, so we can
avoid the INSERT OVERWRITE and use a MERGE statement instead.
In particular, the INSERT OVERWRITE maintenance will look like this
(in SQL):
INSERT OVERWRITE mv
SELECT a, b, SUM(s) as s, SUM(c) AS c
FROM (
SELECT * from mv --OLD DATA
UNION ALL
SELECT a, b, SUM(x) AS s, COUNT(*) AS c --NEW DATA
FROM TAB_A
JOIN TAB_B ON (TAB_A.a = TAB_B.z)
WHERE TAB_A.ROW_ID > 5
GROUP BY a, b) inner_subq
GROUP BY a, b;
We need to transform that into:
MERGE INTO mv
USING (
SELECT a, b, SUM(x) AS s, COUNT(*) AS c --NEW DATA
FROM TAB_A
JOIN TAB_B ON (TAB_A.a = TAB_B.z)
WHERE TAB_A.ROW_ID > 5
GROUP BY a, b) source
ON (mv.a = source.a AND mv.b = source.b)
WHEN MATCHED AND mv.c + source.c <> 0
THEN UPDATE SET mv.s = mv.s + source.s, mv.c = mv.c + source.c
WHEN NOT MATCHED
THEN INSERT VALUES (source.a, source.b, s, c);
To be precise, we need to convert it into a MERGE rewritten as:
FROM mv right outer join _source_ source
ON (mv.a = source.a AND mv.b = source.b)
INSERT INTO TABLE mv
SELECT source.a, source.b, s, c
WHERE mv.a IS NULL AND mv2.b IS NULL
INSERT INTO TABLE mv
SELECT mv.ROW__ID, source.a, source.b, mv.s + source.s, mv.c + source.c
WHERE source.a=mv.a AND source.b=mv.b
SORT BY mv.ROW__ID;