h1. Using Persistent Clusters *Support for Amazon Elastic Block Storage (EBS) is a Beta feature.* When not in use, an EBS-cluster can surrender unneeded EC2 instances, then restart later and continue where it left off. Users no longer need to copy large volumes of data from S3 to local disk on the EC2 instance; data persists reliably and independently in Amazon's EBS, saving compute costs. *Schematic showing how the cluster is set up:* !../../images/persistent-ec2.png! *To Use Persistent Cluster with EBS Storage* # Create a new section called {{my-ebs-cluster}} in the {{\~/.hadoop-cloud/clusters.cfg}} file. # Create storage for the new cluster by creating a temporary EBS volume of size 100GiB, formatting it, and saving it as a snapshot in S3. This way, you only have to do the formatting once. {code} % hadoop-ec2 create-formatted-snapshot my-ebs-cluster 100 {code} You create storage for a single Namenode and for two Datanodes. The volumes to create are described in a JSON spec file, which references the snapshot you just created. Here is the contents of a JSON file, called {{my-ebs-cluster-storage-spec.jso}}: *Example contents of my-ebs-cluster-storage-spec.json* {code} { "nn": [ { "device": "/dev/sdj", "mount_point": "/ebs1", "size_gb": "100", "snapshot_id": "snap-268e704f" }, { "device": "/dev/sdk", "mount_point": "/ebs2", "size_gb": "100", "snapshot_id": "snap-268e704f" } ], "dn": [ { "device": "/dev/sdj", "mount_point": "/ebs1", "size_gb": "100", "snapshot_id": "snap-268e704f" }, { "device": "/dev/sdk", "mount_point": "/ebs2", "size_gb": "100", "snapshot_id": "snap-268e704f" } ] } {code} Each role ({{nn}} and {{dn}}) is the key to an array of volume specifications. In this example, each role has two devices ({{/dev/sdj}} and {{/dev/sdk}}) with different mount points, and generated from an EBS snapshot. The snapshot is the formatted snapshot created earlier, so that the volumes you create are pre-formatted. The size of the drives must match the size of the snapshot created earlier. *To use this file to create actual volumes:* {code} % hadoop-ec2 create-storage my-ebs-cluster nn 1 \ my-ebs-cluster-storage-spec.json % hadoop-ec2 create-storage my-ebs-cluster dn 2 \ my-ebs-cluster-storage-spec.json {code} *To start the cluster with two slave nodes:* {code} % hadoop-ec2 launch-cluster my-ebs-cluster 1 nn,snn,jt 2 dn,tt {code} *To login and run a job which creates some output:* {code} % hadoop-ec2 login my-ebs-cluster # hadoop fs -mkdir input # hadoop fs -put /etc/hadoop/conf/*.xml input # hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output \ 'dfs[a-z.]+' {code} *To view the output:* {code} # hadoop fs -cat output/part-* | head {code} *To shutdown the cluster:* {code} % hadoop-ec2 terminate-cluster my-ebs-cluster {code} *To restart the cluster and login (after a short delay):* {code} % hadoop-ec2 launch-cluster my-ebs-cluster 2 % hadoop-ec2 login my-ebs-cluster {code} *The output from the job you ran before should still be there:* {code} # hadoop fs -cat output/part-* | head {code}