h1. Using Command Line Options It is possible to specify options on the command line when you launch a cluster. The options take precedence over any settings specified in the configuration file. For example, the following command launches a 10-node cluster using a specified image and instance type, overriding the equivalent settings (if any) that are in the {{my-hadoop-cluster}} section of the configuration file. Note that words in options are separated by hyphens ({{--instance-type}}) while the corresponding configuration parameter are separated by underscores ({{instance\_type}}). {code} % hadoop-ec2 launch-cluster --image-id ami-2359bf4a --instance-type c1.xlarge \ my-hadoop-cluster 10 {code} If there options are that you want to specify multiple times, you can set them in the configuration file by separating them with newlines (and leading whitespace). For example: {code} env=AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... {code} The scripts install Hadoop from a tarball (or, in the case of CDH, from RPMs or Debian packages, depending on the OS) at instance boot time. By default, Apache Hadoop 0.20.1 is installed. To run a different version of Hadoop, change the {{user\_data\_file}} setting. For example, to use the latest version of CDH3 add the following parameter: {code} --user-data-file http://archive.cloudera.com/cloud/ec2/cdh3/hadoop-ec2-init-remote.sh {code} By default, the latest version of the specified CDH release series is used. To use a particular release of CDH, use the {{REPO env}} parameter, in addition to setting {{user\_data\_file}}. For example, to specify the Beta 1 release of CDH3: {code} --env REPO=cdh3b1 {code} For this release, Hadoop configuration files can be found in {{/etc/hadoop/conf}} and logs are in {{/var/log/hadoop}}. h2. Customization You can specify a list of packages to install on every instance at boot time by using the {{--user-packages}} command-line option or the {{user\_packages}} configuration parameter. Packages should be space-separated. Note that package names should reflect the package manager being used to install them ({{yum}} or {{apt-get}} depending on the OS). For example, to install RPMs for R and git: {code} % hadoop-ec2 launch-cluster --user-packages 'R git-core' my-hadoop-cluster 10 {code} You have full control over the script that is run when each instance boots. The default script, {{hadoop-ec2-init-remote.sh}}, may be used as a starting point to add extra configuration or customization of the instance. Make a copy of the script in your home directory, or somewhere similar, and set the {{--user-data-file}} command-line option (or the {{user\_data\_file}} configuration parameter) to point to the (modified) copy. This option may also point to an arbitrary URL, which makes it easy to share scripts. For CDH, use the script located at [http://archive.cloudera.com/cloud/ec2/cdh3/hadoop-ec2-init-remote.sh] The {{hadoop-ec2}} script will replace {{%ENV%}} in your user data script with {{USER\_PACKAGES}}, {{AUTO\_SHUTDOWN}}, and {{EBS\_MAPPINGS}}, as well as extra parameters supplied using the {{--env}} command-line flag. Another way of customizing the instance, which may be more appropriate for larger changes, is to create your own image. It's possible to use any image, as long as it satisfies both of the following conditions: * Runs (gzip compressed) user data on boot * Has Java installed