h1. Frequently Asked Questions

{anchor:how-do-i-find-my-cloud-credentials}
h2. How do I find my cloud credentials?

On EC2:
# Go to [http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key]
# Log in, if prompted
# Find your Access Key ID and Secret Access Key in the "Access Credentials" section, under the "Access Keys" tab. You will have to click "Show" to see the text of your secret access key. 

Another good resource is [Understanding Access Credentials for AWS/EC2|http://alestic.com/2009/11/ec2-credentials] by Eric Hammond.

h2. Can I specify my own private key?

Yes, by setting {{whirr.private-key-file}} (or {{\--private-key-file}} on the
command line). You should also set {{whirr.public-key-file}}
({{\--public-key-file}}) at the same time.

Private keys must not have a passphrase associated with them. You can check this
with:

{code}
grep ENCRYPTED ~/.ssh/id_rsa
{code}

If there is no passphrase then there will be no match.

h2. How do I access my cluster from a different network?

By default, access to clusters is restricted to the single IP address of the
machine starting the cluster, as determined by
[Amazon's check IP service|http://checkip.amazonaws.com/]. However, some
networks report multiple origin IP addresses (e.g. they round-robin between
them by connection), which may cause problems if the address used for later
connections is different to the one reported at the time of the first
connection.

A related problem is when you wish to access the cluster from a different
network to the one it was launched from.

In these cases you can specify the IP addresses of the machines that may connect
to the cluster by setting the {{client-cidrs}} property to a comma-separated
list of [CIDR|http://en.wikipedia.org/wiki/Classless\_Inter-Domain\_Routing]
blocks.

For example, {{208.128.0.0/16,38.102.147.107/32}} would allow access from the
{{208.128.0.0}} class B network, and the (single) IP address 38.102.147.107.

h2. How can I start a cluster in a particular location?

By default clusters are started in an arbitrary location (e.g. region or
data center). You can control the location by setting {{location-id}} (see the
[configuration guide|configuration-guide] for details).

For example, in EC2, setting {{location-id}} to {{us-east-1}} would start the
cluster in the US-East region, while setting it to {{us-east-1a}} (note the
final {{a}}) would start the cluster in that particular availability zone
({{us-east-1a}}) in the US-East region.

h2. How do I log in to a node in the cluster?

On EC2, if you know the node's address you can do

{code}
ssh -i ~/.ssh/id_rsa ec2-user@host
{code}

This assumes that you use the default private key; if this is not the case then
specify the one you used at cluster launch.

The Amazon Linux AMI requires that you login as {{ec2-user}}. If needed, you can
become root by doing {{sudo su -}} after logging in.

{anchor:how-can-i-modify-the-instance-installation-and-configuration-scripts}
h2. How can I modify the instance installation and configuration scripts?

The scripts to install and configure cloud instances are downloaded from an S3
bucket by the instances at, or after, boot time. The base URL defaults to
{{http://whirr.s3.amazonaws.com/VERSION/}}, where {{VERSION}} is the
version of Whirr. (Note that S3 buckets are not browsable, so you can't
use a browser to look at these scripts unless you know the URL.)

If you want to change the scripts then you can place a modified copy of the
scripts in the _scripts_ directory of the distribution on a webserver (such as
S3) and change the base URL, by setting the {{run-url-base}} property.

For example, by setting {{run-url-base}} to {{http://example.org/}} the scripts
would be loaded from the {{example.org}} domain. The Java install script, for
instance, would be requested from {{http://example.org/sun/java/install}}.

Scripts have to be publicly readable, so on S3 you have to set the ACL to give
everyone read access. [S3Fox|http://www.s3fox.net/] is a useful Firefox
extension for uploading and managing script files on S3.

You can debug the scripts that run on a cloud instance without having to log
into the instance, since the output is
sent to _whirr.log_ in the directory from which you launched the _whirr_ CLI.

h2. How do I specify the service version and other service properties?

Currently the only way to do this is to modify the scripts to install a
particular version of the service, or to change the service properties from
the defaults.

See "How to modify the instance installation and configuration scripts" above
for details on how to do this.

h2. How can I install custom packages?

You can install extra software by modifying the scripts that run on
the cloud instances. See "How to modify the instance installation and
configuration scripts" above.

h2. How do I run Cloudera's Distribution for Hadoop?

You can run CDH rather than Apache Hadoop by running the {{hadoop}} service and
setting {{whirr.hadoop-install-runurl}} to {{cloudera/cdh/install}} (the 
default is {{apache/hadoop/install}}). Here is a sample configuration:

{code}
whirr.service-name=hadoop
whirr.cluster-name=myhadoopcluster
whirr.instance-templates=1 jt+nn,1 dn+tt
whirr.provider=ec2
whirr.identity=<cloud-provider-identity>
whirr.credential=<cloud-provider-credential>
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
whirr.hadoop-install-runurl=cloudera/cdh/install
{code}

{anchor:other-services}
h2. How do I run a ZooKeeper cluster?

The {{service-name}} property determines the service to run, for ZooKeeper it
should be set to {{zookeeper}}. Here is a sample ZooKeeper configuration for
running a three-node ensemble:

{code}
whirr.service-name=zookeeper
whirr.cluster-name=myzkcluster
whirr.instance-templates=3 zk
whirr.provider=ec2
whirr.identity=<cloud-provider-identity>
whirr.credential=<cloud-provider-credential>
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
{code}

h2. How do I run a Cassandra cluster?

The {{service-name}} property determines the service to run, for Cassandra it
should be set to {{cassandra}}. Here is a sample Cassandra configuration for
running a three-node cluster:

{code}
whirr.service-name=cassandra
whirr.cluster-name=mycassandracluster
whirr.instance-templates=3 cassandra
whirr.provider=ec2
whirr.identity=<cloud-provider-identity>
whirr.credential=<cloud-provider-credential>
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
{code}

h2. How do I automatically tear down a cluster after a fixed time?

It's often convenient to terminate a cluster a fixed time after launch. This is
the case for test clusters, for example. You can achieve this by scheduling the
destroy command using the {{at}} command from your local machine.

*WARNING: The machine from which you issued the {{at}} command must be running (and able
to contact the cloud provider) at the time it runs.*

{code}
% echo 'bin/whirr destroy-cluster --config hadoop.properties' \
    | at 'now + 50 min'
{code}

Note that issuing a {{shutdown}} command on an instance may simply stop the
instance, which is not sufficient to fully terminate the instance, in which
case you would continue to be charged for it. This is the
case for EBS boot instances, for example.

You can read more about this technique on
[Eric Hammond's blog|http://alestic.com/2010/09/ec2-instance-termination].

Also, Mac OS X users might find
[this thread|http://superuser.com/questions/43678/mac-os-x-at-command-not-working]
a useful reference for the {{at}} command.