- All Known Implementing Classes:
- BucketingSortingCtx.BucketCol, BucketingSortingCtx.SortCol
- Enclosing class:
- BucketingSortingCtx
public static interface BucketingSortingCtx.BucketSortCol
BucketSortCol.
Classes that implement this interface provide a way to store information about equivalent
columns as their names and indexes in the schema change going into and out of operators. The
definition of equivalent columns is up to the class which uses these classes, e.g.
BucketingSortingOpProcFactory. For example, two columns are equivalent if they
contain exactly the same data. Though, it's possible that two columns contain exactly the
same data and are not known to be equivalent.
E.g. SELECT key a, key b FROM (SELECT key, count(*) c FROM src GROUP BY key) s;
In this case, assuming this is done in a single map reduce job with the group by operator
processed in the reducer, the data coming out of the group by operator will be bucketed
by key, which would be at index 0 in the schema, after the outer select operator, the output
can be viewed as bucketed by either the column with alias a or the column with alias b. To
represent this, there could be a single BucketSortCol implementation instance whose names
include both a and b, and whose indexes include both 0 and 1.
Implementations of this interface should maintain the restriction that the alias
getNames().get(i) should have index getIndexes().get(i) in the schema.