Class LoptSemiJoinOptimizer


  • public class LoptSemiJoinOptimizer
    extends java.lang.Object
    Implements the logic for determining the optimal semi-joins to be used in processing joins in a query.
    • Field Detail

      • chosenSemiJoins

        private RelNode[] chosenSemiJoins
        Semijoins corresponding to each join factor, if they are going to be filtered by semijoins. Otherwise, the entry is the original join factor.
      • possibleSemiJoins

        private java.util.Map<java.lang.Integer,​java.util.Map<java.lang.Integer,​SemiJoin>> possibleSemiJoins
        Associates potential semijoins with each fact table factor. The first parameter in the map corresponds to the fact table. The second corresponds to the dimension table and a SemiJoin that captures all the necessary semijoin data between that fact and dimension table
      • factorCostOrdering

        private final com.google.common.collect.Ordering<java.lang.Integer> factorCostOrdering
    • Method Detail

      • makePossibleSemiJoins

        public void makePossibleSemiJoins​(LoptMultiJoin multiJoin)
        Determines all possible semijoins that can be used by dimension tables to filter fact tables. Constructs SemiJoinRels corresponding to potential dimension table filters and stores them in the member field "possibleSemiJoins"
        Parameters:
        multiJoin - join factors being optimized
      • isSuitableFilter

        private int isSuitableFilter​(LoptMultiJoin multiJoin,
                                     RexNode joinFilter,
                                     int factIdx)
        Determines if a join filter can be used with a semijoin against a specified fact table. A suitable filter is of the form "factable.col1 = dimTable.col2".
        Parameters:
        multiJoin - join factors being optimized
        joinFilter - filter to be analyzed
        factIdx - index corresponding to the fact table
        Returns:
        index of corresponding dimension table if the filter is appropriate; otherwise -1 is returned
      • findSemiJoinIndexByCost

        private SemiJoin findSemiJoinIndexByCost​(LoptMultiJoin multiJoin,
                                                 java.util.List<RexNode> joinFilters,
                                                 int factIdx,
                                                 int dimIdx)
        Given a list of possible filters on a fact table, determine if there is an index that can be used, provided all the fact table keys originate from the same underlying table.
        Parameters:
        multiJoin - join factors being optimized
        joinFilters - filters to be used on the fact table
        factIdx - index in join factors corresponding to the fact table
        dimIdx - index in join factors corresponding to the dimension table
        Returns:
        SemiJoin containing information regarding the semijoin that can be used to filter the fact table
      • adjustSemiJoinCondition

        private RexNode adjustSemiJoinCondition​(LoptMultiJoin multiJoin,
                                                int leftAdjustment,
                                                RexNode semiJoinCondition,
                                                int leftIdx,
                                                int rightIdx)
        Modifies the semijoin condition to reflect the fact that the RHS is now the second factor into a join and the LHS is the first
        Parameters:
        multiJoin - join factors being optimized
        leftAdjustment - amount the left RexInputRefs need to be adjusted by
        semiJoinCondition - condition to be adjusted
        leftIdx - index of the join factor corresponding to the LHS of the semijoin,
        rightIdx - index of the join factor corresponding to the RHS of the semijoin
        Returns:
        modified semijoin condition
      • validateKeys

        private LoptSemiJoinOptimizer.LcsTable validateKeys​(RelNode factRel,
                                                            java.util.List<java.lang.Integer> leftKeys,
                                                            java.util.List<java.lang.Integer> rightKeys,
                                                            java.util.List<java.lang.Integer> actualLeftKeys)
        Validates the candidate semijoin keys corresponding to the fact table. Ensure the keys all originate from the same underlying table, and they all correspond to simple column references. If unsuitable keys are found, they're removed from the key list and a new list corresponding to the remaining valid keys is returned.
        Parameters:
        factRel - fact table RelNode
        leftKeys - fact table semijoin keys
        rightKeys - dimension table semijoin keys
        actualLeftKeys - the remaining valid fact table semijoin keys
        Returns:
        the underlying fact table if the semijoin keys are valid; otherwise null
      • removeExtraFilters

        private RexNode removeExtraFilters​(java.util.List<java.lang.Integer> keys,
                                           int nFields,
                                           RexNode condition)
        Removes from an expression any sub-expressions that reference key values that aren't contained in a key list passed in. The keys represent join keys on one side of a join. The subexpressions are all assumed to be of the form "tab1.col1 = tab2.col2".
        Parameters:
        keys - join keys from one side of the join
        nFields - number of fields in the side of the join for which the keys correspond
        condition - original expression
        Returns:
        modified expression with filters that don't reference specified keys removed
      • chooseBestSemiJoin

        public boolean chooseBestSemiJoin​(LoptMultiJoin multiJoin)
        Finds the optimal semijoin for filtering the least costly fact table from among the remaining possible semijoins to choose from. The chosen semijoin is stored in the chosenSemiJoins array
        Parameters:
        multiJoin - join factors being optimized
        Returns:
        true if a suitable semijoin is found; false otherwise
      • computeScore

        private double computeScore​(RelNode factRel,
                                    RelNode dimRel,
                                    SemiJoin semiJoin)
        Computes a score relevant to applying a set of semijoins on a fact table. The higher the score, the better.
        Parameters:
        factRel - fact table being filtered
        dimRel - dimension table that participates in semijoin
        semiJoin - semijoin between fact and dimension tables
        Returns:
        computed score of applying the dimension table filters on the fact table
      • removeJoin

        private void removeJoin​(LoptMultiJoin multiJoin,
                                SemiJoin semiJoin,
                                int factIdx,
                                int dimIdx)
        Determines whether a join of the dimension table in a semijoin can be removed. It can be if the dimension keys are unique and the only fields referenced from the dimension table are its semijoin keys. The semijoin keys can be mapped to the corresponding keys from the fact table (because of the equality condition associated with the semijoin keys). Therefore, that's why the dimension table can be removed even though those fields are referenced elsewhere in the query tree.
        Parameters:
        multiJoin - join factors being optimized
        semiJoin - semijoin under consideration
        factIdx - id of the fact table in the semijoin
        dimIdx - id of the dimension table in the semijoin
      • removePossibleSemiJoin

        private void removePossibleSemiJoin​(java.util.Map<java.lang.Integer,​SemiJoin> possibleDimensions,
                                            java.lang.Integer factIdx,
                                            java.lang.Integer dimIdx)
        Removes a dimension table from a fact table's list of possible semijoins
        Parameters:
        possibleDimensions - possible dimension tables associated with the fact table
        factIdx - index corresponding to fact table
        dimIdx - index corresponding to dimension table
      • getChosenSemiJoin

        public RelNode getChosenSemiJoin​(int factIdx)
        Parameters:
        factIdx - index corresponding to the desired factor
        Returns:
        optimal semijoin for the specified factor; may be the factor itself if semijoins are not chosen for the factor