public class SparkDynamicPartitionPruningResolver extends Object implements PhysicalPlanResolver
MapWork
and target MapWork
aren't in
dependent SparkTask
s.
When DPP is run, the source MapWork
produces a temp file that is read by the target MapWork
. The
source MapWork
must be run before the target MapWork
is run, otherwise the target MapWork
will throw a FileNotFoundException
. In order to guarantee this, the source MapWork
must be
inside a SparkTask
that runs before the SparkTask
containing the target MapWork
.
This PhysicalPlanResolver
works by walking through the Task
DAG and iterating over all the
SparkPartitionPruningSinkOperator
s inside the SparkTask
. For each sink operator, it takes the
target MapWork
and checks if it exists in any of the child SparkTask
s. If the target MapWork
is not in any child SparkTask
then it removes the operator subtree that contains the
SparkPartitionPruningSinkOperator
.
Constructor and Description |
---|
SparkDynamicPartitionPruningResolver() |
Modifier and Type | Method and Description |
---|---|
PhysicalContext |
resolve(PhysicalContext pctx)
All physical plan resolvers have to implement this entry method.
|
public SparkDynamicPartitionPruningResolver()
public PhysicalContext resolve(PhysicalContext pctx) throws SemanticException
PhysicalPlanResolver
resolve
in interface PhysicalPlanResolver
SemanticException
Copyright © 2022 The Apache Software Foundation. All rights reserved.