~~ Licensed under the Apache License, Version 2.0 (the "License"); ~~ you may not use this file except in compliance with the License. ~~ You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. See accompanying LICENSE file. --- Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster. --- --- ${maven.build.timestamp} Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. \[ {{{./index.html}Go Back}} \] %{toc|section=1|fromDepth=0} * Mapreduce Tarball You should be able to obtain the MapReduce tarball from the release. If not, you should be able to create a tarball from the source. +---+ $ mvn clean install -DskipTests $ cd hadoop-mapreduce-project $ mvn clean install assembly:assembly -Pnative +---+ <> You will need protoc installed of version 2.4.1 or greater. To ignore the native builds in mapreduce you can omit the <<<-Pnative>>> argument for maven. The tarball should be available in <<>> directory. * Setting up the environment. Assuming you have installed hadoop-common/hadoop-hdfs and exported <<$HADOOP_COMMON_HOME>>/<<$HADOOP_HDFS_HOME>>, untar hadoop mapreduce tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the untarred directory. Set <<$YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>. <> The following instructions assume you have hdfs running. * Setting up Configuration. To start the ResourceManager and NodeManager, you will have to update the configs. Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed configs for HDFS and <<>>. There are 2 config files you will have to setup <<>> and <<>>. ** Setting up <<>> Add the following configs to your <<>>. +---+ mapreduce.cluster.temp.dir No description true mapreduce.cluster.local.dir No description true +---+ ** Setting up <<>> Add the following configs to your <<>> +---+ yarn.resourcemanager.resource-tracker.address host:port host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. yarn.resourcemanager.scheduler.address host:port host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler In case you do not want to use the default scheduler yarn.resourcemanager.address host:port the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. yarn.nodemanager.local-dirs the local directories used by the nodemanager yarn.nodemanager.address 0.0.0.0:port the nodemanagers bind to this port yarn.nodemanager.resource.memory-mb 10240 the amount of memory on the NodeManager in GB yarn.nodemanager.remote-app-log-dir /app-logs directory on hdfs where the application logs are moved to yarn.nodemanager.log-dirs the directories used by Nodemanagers as log directories yarn.nodemanager.aux-services mapreduce.shuffle shuffle service that needs to be set for Map Reduce to run +---+ * Setting up <<>> Make sure you populate the root queues in <<>>. +---+ yarn.scheduler.capacity.root.queues unfunded,default yarn.scheduler.capacity.root.capacity 100 yarn.scheduler.capacity.root.unfunded.capacity 50 yarn.scheduler.capacity.root.default.capacity 50 +---+ * Running daemons. Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>, <<$YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately. Set $<<$YARN_CONF_DIR>> the same as $<> Run ResourceManager and NodeManager as: +---+ $ cd $HADOOP_MAPRED_HOME $ sbin/yarn-daemon.sh start resourcemanager $ sbin/yarn-daemon.sh start nodemanager +---+ You should be up and running. You can run randomwriter as: +---+ $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out +---+ Good luck.