[ Go Back ]
The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client's local file system. Currently NFS Gateway supports and enables the following usage patterns:
The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory. The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
The user running the NFS-gateway must be able to proxy all the users using the NFS mounts. For instance, if user 'nfsserver' is running the gateway, and users belonging to the groups 'nfs-users1' and 'nfs-users2' use the NFS mounts, then in core-site.xml of the namenode, the following must be set (NOTE: replace 'nfsserver' with the user name starting the gateway in your cluster):
<property> <name>hadoop.proxyuser.nfsserver.groups</name> <value>nfs-users1,nfs-users2</value> <description> The 'nfsserver' user is allowed to proxy all members of the 'nfs-users1' and 'nfs-users2' groups. Set this to '*' to allow nfsserver user to proxy any group. </description> </property>
<property> <name>hadoop.proxyuser.nfsserver.hosts</name> <value>nfs-client-host1.com</value> <description> This is the host where the nfs gateway is running. Set this to '*' to allow requests from any hosts to be proxied. </description> </property>
The above are the only required configuration for the NFS gateway in non-secure mode. For Kerberized hadoop clusters, the following configurations need to be added to hdfs-site.xml:
<property> <name>dfs.nfsgateway.keytab.file</name> <value>/etc/hadoop/conf/nfsserver.keytab</value> <!-- path to the nfs gateway keytab --> </property>
<property> <name>dfs.nfsgateway.kerberos.principal</name> <value>nfsserver/_HOST@YOUR-REALM.COM</value> </property>
It's strongly recommended for the users to update a few configuration properties based on their use cases. All the related configuration properties can be added or updated in hdfs-site.xml.
<property> <name>dfs.namenode.accesstime.precision</name> <value>3600000</value> <description>The access time for HDFS file is precise upto this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS. </description> </property>
<property> <name>dfs.nfs3.dump.dir</name> <value>/tmp/.hdfs-nfs</value> </property>
<property> <name>dfs.nfs.rtmax</name> <value>1048576</value> <description>This is the maximum size in bytes of a READ request supported by the NFS gateway. If you change this, make sure you also update the nfs mount's rsize(add rsize= # of bytes to the mount directive). </description> </property>
<property> <name>dfs.nfs.wtmax</name> <value>65536</value> <description>This is the maximum size in bytes of a WRITE request supported by the NFS gateway. If you change this, make sure you also update the nfs mount's wsize(add wsize= # of bytes to the mount directive). </description> </property>
<property> <name>dfs.nfs.exports.allowed.hosts</name> <value>* rw</value> </property>
To change logging level:
log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
To get more details of ONCRPC requests:
log4j.logger.org.apache.hadoop.oncrpc=DEBUG
Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the only export. It is recommended to use the portmap included in NFS gateway package. Even though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the package included portmap is needed on some Linux systems such as REHL6.2 due to an rpcbind bug. More detailed discussions can be found in HDFS-4763.
service nfs stop service rpcbind stop
hadoop portmap OR hadoop-daemon.sh start portmap
No root privileges are required for this command. However, ensure that the user starting the Hadoop cluster and the user starting the NFS gateway are same.
hadoop nfs3 OR hadoop-daemon.sh start nfs3
Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
hadoop-daemon.sh stop nfs3 hadoop-daemon.sh stop portmap
rpcinfo -p $nfs_server_ip
You should see output similar to the following:
program vers proto port 100005 1 tcp 4242 mountd 100005 2 udp 4242 mountd 100005 2 tcp 4242 mountd 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100005 3 udp 4242 mountd 100005 1 udp 4242 mountd 100003 3 tcp 2049 nfs 100005 3 tcp 4242 mountd
showmount -e $nfs_server_ip
You should see output similar to the following:
Exports list on $nfs_server_ip : / (everyone)
Currently NFS v3 only uses TCP as the transportation protocol. NLM is not supported so mount option "nolock" is needed. It's recommended to use hard mount. This is because, even after the client sends all data to NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS when writes were reorderd by NFS client Kernel.
If soft mount has to be used, the user should give it a relatively long timeout (at least no less than the default timeout on the host) .
The users can mount the HDFS namespace as shown below:
mount -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point
Then the users can access HDFS as part of the local file system except that, hard link and random write are not supported yet.
NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client accesses the mount point, NFS client passes the UID to NFS gateway. NFS gateway does a lookup to find user name from the UID, and then passes the username to the HDFS along with the HDFS requests. For example, if the NFS client has current user as "admin", when the user accesses the mounted directory, NFS gateway will access HDFS as user "admin". To access HDFS as the user "hdfs", one needs to switch the current user to "hdfs" on the client system when accessing the mounted directory.
The system administrator must ensure that the user on NFS client host has the same name and UID as that on the NFS gateway host. This is usually not a problem if the same user management system (e.g., LDAP/NIS) is used to create and deploy users on HDFS nodes and NFS client node. In case the user account is created manually in different hosts, one might need to modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found in RPC specification.