Tez Shuffle Handler Overview

A Tez specific shuffle handler allows Tez DAGs to shuffle data in a way that takes advantage of the new features in Tez. In particular, the Tez shuffle handler allows DAGs to shuffle data more efficiently for Tez’s new data movements types and runtime optimizations, such as auto-reduce parallelism. Long running Tez sessions will be able to clean up intermediate data for completed queries and Tez applications can decide to clean up completed intermediate data for running applications.

Setup for the Tez Shuffle Handler


Requires: Apache Tez 0.9.0 or above

Configuration in the client specify the Tez shuffle handler

tez-site.xml
-------------
...
<property>
  <name>tez.am.shuffle.auxiliary-service.id</name>
  <value>tez_shuffle</value>
</property>
...

Deploying the Tez Shuffle Handler

The Tez Shuffle Handler jar artifact org.apache.org:tez-aux-services needs to be placed into the Node Manager classpath and restarted

Setup for Node Manager

Requires: Apache Hadoop 2.6.0 or above

The following configuration needs to be setup in the Node Manager yarn-site.xml to enable the Tez Shuffle Handler

yarn-site.xml
-------------
...
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>tez_shuffle</value>
</property>

<property>
  <name>yarn.nodemanager.aux-services.tez_shuffle.class</name>
  <value>org.apache.tez.auxservices.ShuffleHandler</value>
</property>
...