The YARN service registry is built on top of Apache Zookeeper. It is configured by way of a Hadoop Configuration class: the instance used to create the service controls the behavior of the client.
This document lists the configuration parameters which control the registry client and its deployment in the YARN Resource Manager.
The default values of all these settings are defined in core-default.xml. The values in this file may not match those listed in this document. If this is the case, the values in core-default.xml MUST be considered normative.
Changes to the configuration values SHOULD be done in core-site.xml. This will ensure that client and non-YARN applications will pick up the values, so enabling them to read from and potentially write to the registry.
The Resource Manager manages user directory creation and record cleanup on YARN container/application attempt/application completion.
<property> <description> Is the registry enabled in the YARN Resource Manager? If true, the YARN RM will, as needed. create the user and system paths, and purge service records when containers, application attempts and applications complete. If false, the paths must be created by other means, and no automatic cleanup of service records will take place. </description> <name>hadoop.registry.rm.enabled</name> <value>false</value> </property>
If the property is set in core-site.xml or yarn-site.xml, the YARN Resource Manager will behave as follows: 1. On startup: create the initial root paths of /, /services and /users. On a secure cluster, access will be restricted to the system accounts (see below). 2. When a user submits a job: create the user path under /users. 3. When a container is completed: delete from the registry all service records with a yarn:persistence field of value container, and a yarn:id field whose value matches the ID of the completed container. 4. When an application attempt is completed: remove all service records with yarn:persistence set to application-attempt and yarn:id set to the pplication attempt ID. 5. When an application finishes: remove all service records with yarn:persistence set to application and yarn:id set to the application ID.
All these operations are asynchronous, so that zookeeper connectivity problems do not delay RM operations or work scheduling.
If the property hadoop.registry.rm.enabled is set to false, the RM will not interact with the registry —and the listed operations will not take place. The root paths may be created by other means, but service record cleanup will not take place.
This is an essential setting: it identifies the lists of zookeeper hosts and the ports on which the ZK services are listening.
<property> <description> A comma separated list of hostname:port pairs defining the zookeeper quorum binding for the registry </description> <name>hadoop.registry.zk.quorum</name> <value>localhost:2181</value> </property>
It takes a comma-separated list, such as zk1:2181 ,zk2:2181, zk3:2181
This path sets the base zookeeper node for the registry
<property> <description> The root zookeeper node for the registry </description> <name>hadoop.registry.zk.root</name> <value>/registry</value> </property>
The default value of /registry is normally sufficient. A different value may be needed for security reasons or because the /registry path is in use.
The root value is prepended to all registry paths so as to create the absolute path. For example:
A different value of hadoop.registry.zk.root would result in a different mapping to absolute zookeeper paths.
Registry security is enabled when the property hadoop.registry.secure is set to true. Once set, nodes are created with permissions, so that only a specific user and the configured cluster “superuser” accounts can write under their home path of ${hadoop.registry.zk.root}/users. Only the superuser accounts will be able to manipulate the root path, including ${hadoop.registry.zk.root}/services and ${hadoop.registry.zk.root}/users.
All write operations on the registry (including deleting entries and paths) must be authenticated. Read operations are still permitted by unauthenticated callers.
The key settings for secure registry support are:
<property> <description> Key to set if the registry is secure. Turning it on changes the permissions policy from "open access" to restrictions on kerberos with the option of a user adding one or more auth key pairs down their own tree. </description> <name>hadoop.registry.secure</name> <value>false</value> </property>
The registry clients must identify the JAAS context which they use to authenticate to the registry.
<property> <description> Key to define the JAAS context. Used in secure mode </description> <name>hadoop.registry.jaas.context</name> <value>Client</value> </property>
Note as the Resource Manager is simply another client of the registry, it too must have this context defined.
These are the the accounts which are given full access to the base of the registry. The Resource Manager needs this option to create the root paths.
Client applications writing to the registry access to the nodes it creates.
<property> <description> A comma separated list of Zookeeper ACL identifiers with system access to the registry in a secure cluster. These are given full access to all entries. If there is an "@" at the end of a SASL entry it instructs the registry client to append the default kerberos domain. </description> <name>hadoop.registry.system.acls</name> <value>sasl:yarn@, sasl:mapred@, sasl:mapred@, sasl:hdfs@</value> </property> <property> <description> The kerberos realm: used to set the realm of system principals which do not declare their realm, and any other accounts that need the value. If empty, the default realm of the running process is used. If neither are known and the realm is needed, then the registry service/client will fail. </description> <name>hadoop.registry.kerberos.realm</name> <value></value> </property>
Example: an hadoop.registry.system.acls entry of sasl:yarn@, sasl:admin@EXAMPLE.COM, sasl:system@REALM2, would, in a YARN cluster with the realm EXAMPLE.COM, add the following admin accounts to every node
The identity of a client application creating registry entries will be automatically included in the permissions of all entries created. If, for example, the account creating an entry was hbase, another entry would be created
Important: when setting the system ACLS, it is critical to include the identity of the YARN Resource Manager.
The RM needs to be able to create the root and user paths, and delete service records during application and container cleanup.
Some low level options manage the ZK connection —more specifically, its failure handling.
The Zookeeper registry clients use Apache Curator to connect to Zookeeper, a library which detects timeouts and attempts to reconnect to one of the servers which forms the zookeeper quorum. It is only after a timeout is detected that a retry is triggered.
<property> <description> Zookeeper session timeout in milliseconds </description> <name>hadoop.registry.zk.session.timeout.ms</name> <value>60000</value> </property> <property> <description> Zookeeper connection timeout in milliseconds </description> <name>hadoop.registry.zk.connection.timeout.ms</name> <value>15000</value> </property> <property> <description> Zookeeper connection retry count before failing </description> <name>hadoop.registry.zk.retry.times</name> <value>5</value> </property> <property> <description> </description> <name>hadoop.registry.zk.retry.interval.ms</name> <value>1000</value> </property> <property> <description> Zookeeper retry limit in milliseconds, during exponential backoff. This places a limit even if the retry times and interval limit, combined with the backoff policy, result in a long retry period </description> <name>hadoop.registry.zk.retry.ceiling.ms</name> <value>60000</value> </property>
The retry strategy used in the registry client is BoundedExponentialBackoffRetry: This backs off exponentially on connection failures before eventually concluding that the quorum is unreachable and failing.
<!-- YARN registry --> <property> <description> Is the registry enabled: does the RM start it up, create the user and system paths, and purge service records when containers, application attempts and applications complete </description> <name>hadoop.registry.rm.enabled</name> <value>false</value> </property> <property> <description> A comma separated list of hostname:port pairs defining the zookeeper quorum binding for the registry </description> <name>hadoop.registry.zk.quorum</name> <value>localhost:2181</value> </property> <property> <description> The root zookeeper node for the registry </description> <name>hadoop.registry.zk.root</name> <value>/registry</value> </property> <property> <description> Key to set if the registry is secure. Turning it on changes the permissions policy from "open access" to restrictions on kerberos with the option of a user adding one or more auth key pairs down their own tree. </description> <name>hadoop.registry.secure</name> <value>false</value> </property> <property> <description> A comma separated list of Zookeeper ACL identifiers with system access to the registry in a secure cluster. These are given full access to all entries. If there is an "@" at the end of a SASL entry it instructs the registry client to append the default kerberos domain. </description> <name>hadoop.registry.system.acls</name> <value>sasl:yarn@, sasl:mapred@, sasl:mapred@, sasl:hdfs@</value> </property> <property> <description> The kerberos realm: used to set the realm of system principals which do not declare their realm, and any other accounts that need the value. If empty, the default realm of the running process is used. If neither are known and the realm is needed, then the registry service/client will fail. </description> <name>hadoop.registry.kerberos.realm</name> <value></value> </property> <property> <description> Key to define the JAAS context. Used in secure mode </description> <name>hadoop.registry.jaas.context</name> <value>Client</value> </property> <property> <description> Zookeeper session timeout in milliseconds </description> <name>hadoop.registry.zk.session.timeout.ms</name> <value>60000</value> </property> <property> <description> Zookeeper session timeout in milliseconds </description> <name>hadoop.registry.zk.connection.timeout.ms</name> <value>15000</value> </property> <property> <description> Zookeeper connection retry count before failing </description> <name>hadoop.registry.zk.retry.times</name> <value>5</value> </property> <property> <description> </description> <name>hadoop.registry.zk.retry.interval.ms</name> <value>1000</value> </property> <property> <description> Zookeeper retry limit in milliseconds, during exponential backoff: {@value} This places a limit even if the retry times and interval limit, combined with the backoff policy, result in a long retry period </description> <name>hadoop.registry.zk.retry.ceiling.ms</name> <value>60000</value> </property>