~~ Licensed under the Apache License, Version 2.0 (the "License"); ~~ you may not use this file except in compliance with the License. ~~ You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. See accompanying LICENSE file. --- Service Level Authorization Guide --- --- ${maven.build.timestamp} Service Level Authorization Guide %{toc|section=1|fromDepth=0} * Purpose This document describes how to configure and manage Service Level Authorization for Hadoop. * Prerequisites Make sure Hadoop is installed, configured and setup correctly. For more information see: * Single Node Setup for first-time users. * Cluster Setup for large, distributed clusters. * Overview Service Level Authorization is the initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the necessary, pre-configured, permissions and are authorized to access the given service. For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs. The <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> configuration file is used to define the access control lists for various Hadoop services. Service Level Authorization is performed much before to other access control checks such as file-permission checks, access control on job queues etc. * Configuration This section describes how to configure service-level authorization via the configuration file <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>>. ** Enable Service Level Authorization By default, service-level authorization is disabled for Hadoop. To enable it set the configuration property hadoop.security.authorization to true in <<<${HADOOP_CONF_DIR}/core-site.xml>>>. ** Hadoop Services and Configuration Properties This section lists the various Hadoop services and their configuration knobs: *-------------------------------------+--------------------------------------+ || Property || Service *-------------------------------------+--------------------------------------+ security.client.protocol.acl | ACL for ClientProtocol, which is used by user code via the DistributedFileSystem. *-------------------------------------+--------------------------------------+ security.client.datanode.protocol.acl | ACL for ClientDatanodeProtocol, the client-to-datanode protocol for block recovery. *-------------------------------------+--------------------------------------+ security.datanode.protocol.acl | ACL for DatanodeProtocol, which is used by datanodes to communicate with the namenode. *-------------------------------------+--------------------------------------+ security.inter.datanode.protocol.acl | ACL for InterDatanodeProtocol, the inter-datanode protocol for updating generation timestamp. *-------------------------------------+--------------------------------------+ security.namenode.protocol.acl | ACL for NamenodeProtocol, the protocol used by the secondary namenode to communicate with the namenode. *-------------------------------------+--------------------------------------+ security.inter.tracker.protocol.acl | ACL for InterTrackerProtocol, used by the tasktrackers to communicate with the jobtracker. *-------------------------------------+--------------------------------------+ security.job.submission.protocol.acl | ACL for JobSubmissionProtocol, used by job clients to communciate with the jobtracker for job submission, querying job status etc. *-------------------------------------+--------------------------------------+ security.task.umbilical.protocol.acl | ACL for TaskUmbilicalProtocol, used by the map and reduce tasks to communicate with the parent tasktracker. *-------------------------------------+--------------------------------------+ security.refresh.policy.protocol.acl | ACL for RefreshAuthorizationPolicyProtocol, used by the dfsadmin and mradmin commands to refresh the security policy in-effect. *-------------------------------------+--------------------------------------+ security.ha.service.protocol.acl | ACL for HAService protocol used by HAAdmin to manage the active and stand-by states of namenode. *-------------------------------------+--------------------------------------+ ** Access Control Lists <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> defines an access control list for each Hadoop service. Every access control list has a simple format: The list of users and groups are both comma separated list of names. The two lists are separated by a space. Example: <<>>. Add a blank at the beginning of the line if only a list of groups is to be provided, equivalently a comman-separated list of users followed by a space or nothing implies only a set of given users. A special value of <<<*>>> implies that all users are allowed to access the service. ** Refreshing Service Level Authorization Configuration The service-level authorization configuration for the NameNode and JobTracker can be changed without restarting either of the Hadoop master daemons. The cluster administrator can change <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> on the master nodes and instruct the NameNode and JobTracker to reload their respective configurations via the <<<-refreshServiceAcl>>> switch to <<>> and <<>> commands respectively. Refresh the service-level authorization configuration for the NameNode: ---- $ bin/hadoop dfsadmin -refreshServiceAcl ---- Refresh the service-level authorization configuration for the JobTracker: ---- $ bin/hadoop mradmin -refreshServiceAcl ---- Of course, one can use the <<>> property in <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> to restrict access to the ability to refresh the service-level authorization configuration to certain users/groups. ** Examples Allow only users <<>>, <<>> and users in the <<>> group to submit jobs to the MapReduce cluster: ---- security.job.submission.protocol.acl alice,bob mapreduce ---- Allow only DataNodes running as the users who belong to the group datanodes to communicate with the NameNode: ---- security.datanode.protocol.acl datanodes ---- Allow any user to talk to the HDFS cluster as a DFSClient: ---- security.client.protocol.acl * ----