Package org.apache.calcite.plan.volcano
Overview
A planner (also known as an optimizer) finds the
most efficient implementation of a
relational expression
.
Interface RelOptPlanner
defines a planner,
and class VolcanoPlanner
is an
implementation which uses a dynamic programming technique. It is based upon
the Volcano optimizer [1].
Interface RelOptCost
defines a cost
model; class VolcanoCost
is
the implementation for a VolcanoPlanner
.
A RelSet
is a set of equivalent
relational expressions. They are equivalent because they will produce the
same result for any set of input data. It is an equivalence class: two
expressions are in the same set if and only if they are in the same
RelSet
.
One of the unique features of the optimizer is that expressions can take
on a variety of physical traits. Each relational expression has a set of
traits. Each trait is described by an implementation of
RelTraitDef
. Manifestations of the trait
implement RelTrait
. The most common example
of a trait is calling convention: the protocol used to receive and transmit
data. ConventionTraitDef
defines the trait
and Convention
enumerates the
protocols. Every relational expression has a single calling convention by
which it returns its results. Some examples:
JdbcConvention
is a fairly conventional convention; the results are rows from aJDBC result set
.Convention.NONE
means that a relational expression cannot be implemented; typically there are rules which can transform it to equivalent, implementable expressions.EnumerableConvention
implements the expression by generating Java code. The code places the current row in a Java variable, then calls the piece of code which implements the consuming relational expression. For example, a Java array reader of thenames
array would generate the following code:String[] names; for (int i = 0; i < names.length; i++) { String name = names[i]; // ... code for consuming relational expression ... }
New traits are added to the planner in one of two ways:
- If the new trait is integral to Calcite, then each and every
implementation of
RelNode
should include its manifestation of the trait as part of theRelTraitSet
passed toAbstractRelNode
's constructor. It may be useful to provide alternateAbstractRelNode
constructors if most relational expressions use a single manifestation of the trait. - If the new trait describes some aspect of a Farrago extension, then
the RelNodes passed to
VolcanoPlanner.setRoot(org.apache.calcite.rel.RelNode)
should have their trait sets expanded before thesetRoot(RelNode)
call.
The second trait extension mechanism requires that implementations of
Object.clone()
must not assume the
type and quantity of traits in their trait set. In either case, the new
RelTraitDef
implementation must be
VolcanoPlanner.addRelTraitDef(org.apache.calcite.plan.RelTraitDef)
registered with the planner.
A RelSubset
is a subset of a
RelSet
containing expressions which are equivalent and which
have the same Convention
. Like RelSet
, it is an
equivalence class.
Related packages
<a href="../rel/package-summary.html">org.apache.calcite.rel</a>
definesrelational expressions
.
Details
...
Sets merge when the result of a rule already exists in another set. This implies that all of the expressions are equivalent. The RelSets are merged, and so are the contained RelSubsets.
Expression registration.
- Expression is added to a set. We may find that an equivalent expression already exists. Otherwise, this is the moment when an expression becomes public, and fixed. Its digest is assigned, which allows us to quickly find identical expressions.
- We match operands, figure out which rules are applicable, and generate rule calls. The rule calls are placed on a queue, and the important ones are called later.
- RuleCalls allow us to defer the invocation of rules. When an expression is registered
Algorithm
To optimize a relational expression R:
1. Register R.
2. Create rule-calls for all applicable rules.
3. Rank the rule calls by importance.
4. Call the most important rule
5. Repeat.
Importance. A rule-call is important if it is likely to produce better implementation of a relexp on the plan's critical path. Hence (a) it produces a member of an important RelSubset, (b) its children are cheap.
Conversion. Conversions are difficult because we have to work backwards from the goal.
Rule triggering
The rules are:
PushFilterThroughProjectRule
. Operands:Filter Project
CombineProjectsRule
. Operands:Project Project
A rule can be triggered by a change to any of its operands. Consider the rule to combine two filters into one. It would have operands [Filter [Filter]]. If I register a new Filter, it will trigger the rule in 2 places. Consider:
Project (deptno) [exp 1, subset A] Filter (gender='F') [exp 2, subset B] Project (deptno, gender, empno) [exp 3, subset C] Project (deptno, gender, empno, salary) [exp 4, subset D] TableScan (emp) [exp 0, subset X]
Apply PushFilterThroughProjectRule
to [exp 2, exp 3]:
Project (deptno) [exp 1, subset A] Project (deptno, gender, empno) [exp 5, subset B] Filter (gender='F') [exp 6, subset E] Project (deptno, gender, empno, salary) [exp 4, subset D] TableScan (emp) [exp 0, subset X]
Two new expressions are created. Expression 5 is in subset B (because it is equivalent to expression 2), and expression 6 is in a new equivalence class, subset E.
The products of a applying a rule can trigger a cascade of rules. Even in this simple system (2 rules and 4 initial expressions), two more rules are triggered:
- Registering exp 5 triggers
CombineProjectsRule
(exp 1, exp 5), which createsProject (deptno) [exp 7, subset A] Filter (gender='F') [exp 6, subset E] Project (deptno, gender, empno, salary) [exp 4, subset D] TableScan (emp) [exp 0, subset X]
- Registering exp 6 triggers
PushFilterThroughProjectRule
(exp 6, exp 4), which createsProject (deptno) [exp 1, subset A] Project (deptno, gender, empno) [exp 5, subset B] Project (deptno, gender, empno, salary) [exp 8, subset E] Filter (gender='F') [exp 9, subset F] TableScan (emp) [exp 0, subset X]
Each rule application adds additional members to existing subsets. The
non-singleton subsets are now A {1, 7}, B {2, 5} and E {6, 8}, and new
combinations are possible. For example,
CombineProjectsRule
(exp 7, exp 8) further reduces the depth
of the tree to:
Project (deptno) [exp 10, subset A] Filter (gender='F') [exp 9, subset F] TableScan (emp) [exp 0, subset X]
Todo: show how rules can cause subsets to merge.
Conclusion:
- A rule can be triggered by any of its operands.
- If a subset is a child of more than one parent, it can trigger rule matches for any of its parents.
- Registering one relexp can trigger several rules (and even the same rule several times).
- Firing rules can cause subsets to merge.
References
-
Interface Summary Interface Description VolcanoPlannerPhaseRuleMappingInitializer VolcanoPlannerPhaseRuleMappingInitializer describes an inteface for initializing the mapping ofVolcanoPlannerPhase
s to sets of rule descriptions. -
Class Summary Class Description AbstractConverter Converts a relational expression to any given output convention.AbstractConverter.ExpandConversionRule Rule which converts anAbstractConverter
into a chain of converters from the source relation to the target traits.ChainedPhaseRuleMappingInitializer ChainedPhaseRuleMappingInitializer is an abstract implementation ofVolcanoPlannerPhaseRuleMappingInitializer
that allows additional rules to be layered on top of those configured by a subordinateVolcanoPlannerPhaseRuleMappingInitializer
.RelSet ARelSet
is an equivalence-set of expressions; that is, a set of expressions which have identical semantics.RelSubset Subset of an equivalence class where all relational expressions have the same physical properties.RelSubset.CheapestPlanReplacer Visitor which walks over a tree ofRelSet
s, replacing each node with the cheapest implementation of the expression.RuleQueue Priority queue of relexps whose rules have not been called, and rule-matches which have not yet been acted upon.RuleQueue.PhaseMatchList PhaseMatchList represents a set ofrule-matches
for a particularphase of the planner's execution
.RuleQueue.RuleMatchImportanceComparator ComparesVolcanoRuleMatch
objects according to their importance.VolcanoCost VolcanoCost
represents the cost of a plan node.VolcanoCost.Factory Implementation ofRelOptCostFactory
that createsVolcanoCost
s.VolcanoPlanner VolcanoPlanner optimizes queries by transforming expressions selectively according to a dynamic programming algorithm.VolcanoPlanner.DeferringRuleCall A rule call which defers its actions.VolcanoPlanner.DirectProvenance A RelNode that came directly from another RelNode via a copy.VolcanoPlanner.Provenance Where a RelNode came from.VolcanoPlanner.RuleProvenance A RelNode that came via the firing of a rule.VolcanoPlanner.UnknownProvenance We do not know where this RelNode came from.VolcanoRelMetadataProvider VolcanoRelMetadataProvider implements theRelMetadataProvider
interface by combining metadata from the rels making up an equivalence class.VolcanoRuleCall VolcanoRuleCall
implements theRelOptRuleCall
interface for VolcanoPlanner.VolcanoRuleMatch A match of a rule to a particular set of target relational expressions, frozen in time. -
Enum Summary Enum Description VolcanoPlannerPhase VolcanoPlannerPhase represents the phases of operation that theVolcanoPlanner
passes through during optimization of a tree ofRelNode
objects.