h1. Running MapReduce Jobs After you launch a cluster, a {{hadoop-site.xml}} file is created in the directory {{~/.hadoop-cloud/}}. You can use this to connect to the cluster by setting the {{HADOOP\_CONF\_DIR}} environment variable. (It is also possible to set the configuration file to use by passing it as a {{-conf}} option to Hadoop Tools): {code} % export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster {code} *To browse HDFS:* {code} % hadoop fs -ls / {code} Note that the version of Hadoop installed locally should match the version installed on the cluster. \\ \\ *To run a job locally:* {code} % hadoop fs -mkdir input # create an input directory % hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there % hadoop jar $HADOOP_HOME/hadoop-*examples*.jar wordcount input output % hadoop fs -cat output/part-* | head {code} The preceding examples assume that you installed Hadoop on your local machine. But you can also run jobs within the cluster. \\ \\ *To run jobs within the cluster:* 1. Log into the Namenode: {code} % hadoop-ec2 login my-hadoop-cluster {code} 2. Run the job: {code} # hadoop fs -mkdir input # hadoop fs -put /etc/hadoop/conf/*.xml input # hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs\[a-z.]+' # hadoop fs -cat output/part-* | head {code}