~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements.  See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License.  You may obtain a copy of the License at
~~
~~     http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
~~

Chukwa Administration Guide

  This chapter is the detailed configuration guide to Chukwa configuration.

  Please read this chapter carefully and ensure that all requirements have 
been satisfied. Failure to do so will cause you (and us) grief debugging 
strange errors and/or data loss.

  Chukwa uses the same configuration system as Hadoop. To configure a deploy, 
edit a file of environment variables in etc/chukwa/chukwa-env.sh -- this 
configuration is used mostly by the launcher shell scripts getting the 
cluster off the ground -- and then add configuration to an XML file to do 
things like override Chukwa defaults, tell Chukwa what Filesystem to use, 
or the location of the HBase configuration.

  When running in distributed mode, after you make an edit to an Chukwa 
configuration, make sure you copy the content of the conf directory to all 
nodes of the cluster. Chukwa will not do this for you. Use rsync.

Pre-requisites

  Chukwa should work on any POSIX platform, but GNU/Linux is the only
production platform that has been tested extensively. Chukwa has also been used
successfully on Mac OS X, which several members of the Chukwa team use for 
development.

  The only absolute software requirements are {{{http://java.sun.com}Java 1.6}}
or better and {{{http://hadoop.apache.org/}Hadoop 0.20.205.1+}}.
  
  HICC, the Chukwa visualization interface, requires {{{http://hbase.apache.org}HBase 0.90.4+}}.

  The Chukwa cluster management scripts rely on <ssh>; these scripts, however,
are not required if you have some alternate mechanism for starting and stopping
daemons.

Installing Chukwa

  A minimal Chukwa deployment has three components:

  * A Hadoop and HBase cluster on which Chukwa will process data (referred to as the Chukwa cluster).

  * A collector process, that writes collected data to HBase.

  * One or more agent processes, that send monitoring data to the collector. 
    The nodes with active agent processes are referred to as the monitored 
    source nodes.

  * Data analytics script, summarize Hadoop Cluster Health.

  * HICC, the Chukwa visualization tool.

[]

[./images/chukwa_architecture.png] Chukwa Components

* First Steps

  * Obtain a copy of Chukwa. You can find the latest release on the 
    {{{http://hadoop.apache.org/chukwa/releases.html} Chukwa release page}}.

  * Un-tar the release, via <tar xzf>.

  * Make sure a copy of Chukwa is available on each node being monitored, and on
each node that will run a collector.

  * We refer to the directory containing Chukwa as <CHUKWA_HOME>. It may
be helpful to set <CHUKWA_HOME> explicitly in your environment,
but Chukwa does not require that you do so.

* General Configuration

  Agents and collectors are configured differently, but part of the process
is common to both.

  * Make sure that <JAVA_HOME> is set correctly and points to a Java 1.6 JRE. 
It's generally best to set this in <etc/chukwa/chukwa-env.sh>.

  * In <etc/chukwa/chukwa-env.sh>, set <CHUKWA_LOG_DIR> and
<CHUKWA_PID_DIR> to the directories where Chukwa should store its
console logs and pid files.  The pid directory must not be shared between
different Chukwa instances: it should be local, not NFS-mounted.

  * Optionally, set CHUKWA_IDENT_STRING. This string is
 used to name Chukwa's own console log files.

Agents

  Agents are the Chukwa processes that actually produce data. This section
describes how to configure and run them. More details are available in the
{{{./agent.html}Agent configuration guide}}.

* Configuration

  This section describes how to set up the agent process on the source nodes.

  The one mandatory configuration step is to set up 
<$CHUKWA_HOME/etc/chukwa/collectors>. This file should contain a list
of hosts that will run Chukwa collectors. Agents will pick a random collector
from this list to try sending to, and will fail-over to another listed collector
on error.  The file should look something like:

---
http://<collector1HostName>:<collector1Port>/
http://<collector2HostName>:<collector2Port>/
http://<collector3HostName>:<collector3Port>/
---

  Edit the <CHUKWA_HOME/etc/chukwa/initial_adaptors> configuration file. 
This is where you tell Chukwa what log files to monitor. See
{{{./agent.html#Adaptors}the adaptor configuration guide}} for
a list of available adaptors.

  There are a number of optional settings in 
<$CHUKWA_HOME/etc/chukwa/chukwa-agent-conf.xml>:

  * The most important of these is the cluster/group name that identifies the
monitored source nodes. This value is stored in each Chunk of collected data;
you can therefore use it to distinguish data coming from different groups of 
machines.

---
 <property>
    <name>chukwaAgent.tags</name>
    <value>cluster="demo"</value>
    <description>The cluster's name for this agent</description>
 </property>
---

  * Another important option is <chukwaAgent.checkpoint.dir>.
This is the directory Chukwa will use for its periodic checkpoints of 
running adaptors.  It <<must not>> be a shared directory; use a local, 
not NFS-mount, directory.

  * Setting the option <chukwaAgent.control.remote> will disallow remote 
connections to the agent control socket.

* Starting, Stopping, And Monitoring

  To run an agent process on a single node, use <bin/chukwa agent>.

  Typically, agents run as daemons. The script <bin/start-agents.sh> 
will ssh to each machine listed in <etc/chukwa/agents> and start an agent,
running in the background. The script <bin/stop-agents.sh> 
does the reverse.

  You can, of course, use any other daemon-management system you like. 
For instance, <tools/init.d> includes init scripts for running
Chukwa agents.

  To check if an agent is working properly, you can telnet to the control
port (9093 by default) and hit "enter". You will get a status message if
the agent is running normally.

Configuring Hadoop For Monitoring

  One of the key goals for Chukwa is to collect logs from Hadoop clusters. 
This section describes how to configure Hadoop to send its logs to Chukwa. 
Note that these directions require Hadoop 0.20.205.0+.  Earlier versions of 
Hadoop do not have the hooks that Chukwa requires in order to grab 
MapReduce job logs.

  The Hadoop configuration files are located in <HADOOP_HOME/etc/hadoop>.
To setup Chukwa to collect logs from Hadoop, you need to change some of the 
Hadoop configuration files.

  * Copy CHUKWA_HOME/etc/chukwa/hadoop-log4j.properties file to HADOOP_CONF_DIR/log4j.properties

  * Copy CHUKWA_HOME/etc/chukwa/hadoop-metrics2.properties file to HADOOP_CONF_DIR/hadoop-metrics2.properties

  * Edit HADOOP_HOME/etc/hadoop/hadoop-metrics2.properties file and change $CHUKWA_LOG_DIR to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)

Setup HBase Table

  Chukwa is moving towards a model of using HBase to store metrics data to 
allow real-time charting. This section describes how to configure HBase and 
HICC to work together.

  * Presently, we support HBase 0.90.4+. If you have HBase 0.89 jars anywhere, 
they will cause linkage errors.  Check for and remove them.

  * Setting up tables:

---
/path/to/hbase-0.90.4/bin/hbase shell < etc/chukwa/hbase.schema
---

Collectors

  This section describes how to set up the Chukwa collectors.
For more details, see {{{./collector.html}the collector configuration guide}}.

* Configuration

  First, edit <$CHUKWA_HOME/etc/chukwa/chukwa-env.sh> In addition to 
the general directions given above, you should set <HADOOP_CONF_DIR> and
<HBASE_CONF_DIR>.  This should be the Hadoop deployment Chukwa will use to 
store collected data.  You will get a version mismatch error if this is 
configured incorrectly.

  Next, edit <$CHUKWA_HOME/etc/chukwa/chukwa-collector-conf.xml>.

** Use HBase For Data Storage

  * Configuring the collector: set HBaseWriter as your writer, or add it 
    to the pipeline if you are using 

---
  <property>
    <name>chukwaCollector.writerClass</name>
    <value>org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter</value>
  </property>

  <property>
    <name>chukwaCollector.pipeline</name>
    <value>org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter</value>
  </property>
---

** Use HDFS For Data Storage

  The one mandatory configuration parameter is <writer.hdfs.filesystem>.
This should be set to the HDFS root URL on which Chukwa will store data.
Various optional configuration options are described in 
{{{./collector.html}the collector configuration guide}}
and in the collector configuration file itself.

* Starting, Stopping, And Monitoring

  To run a collector process on a single node, use <bin/chukwa collector>.

  Typically, collectors run as daemons. The script <bin/start-collectors.sh> 
will ssh to each collector listed in <etc/chukwa/collectors> and start a
collector, running in the background. The script <bin/stop-collectors.sh> 
does the reverse.

  You can, of course, use any other daemon-management system you like. 
For instance, <tools/init.d> includes init scripts for running
Chukwa collectors.

  To check if a collector is working properly, you can simply access
<http://collectorhost:collectorport/chukwa?ping=true> with a web browser.
If the collector is running, you should see a status page with a handful of 
statistics.

ETL Processes (Optional)

  For storing data to HDFS, the archive and demux mapreduce jobs can be 
started by running:

---
CHUKWA_HOME/bin/chukwa archive
---
 
  Demux mapreduce jobs can be started by rnning:

---
CHUKWA_HOME/bin/chukwa demux
---

Setup Cluster Aggregation Script

  For data analytics with Apache Pig, there are some additional environment setup. Apache Pig does not use the same environment variable name as Hadoop, therefore make sure the following environment are setup correctly:

  [[1]] Download and setup Apache Pig 0.9.1.

  [[2]] Define Apache Pig class path:

---
export PIG_CLASSPATH=$HADOOP_CONF_DIR:$HBASE_CONF_DIR
---

  [[3]] Create a jar file of HBASE_CONF_DIR, run:

---
jar cf $CHUKWA_HOME/hbase-env.jar $HBASE_CONF_DIR
---

  [[4]] Setup a cron job or Hudson job for analytics script to run periodically:

---
pig -Dpig.additional.jars=${HBASE_HOME}/hbase-0.90.4.jar:${HBASE_HOME}/lib/zookeeper-3.3.2.jar:${PIG_PATH}/pig.jar:${CHUKWA_HOME}/hbase-env.jar ${CHUKWA_HOME}/script/pig/ClusterSummary.pig
---

HICC

* Configuration

  Edit <etc/chukwa/auth.conf> and add authorized user to the list.

* Starting, Stopping, And Monitoring

  The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface.
HICC is started by invoking

---
bin/chukwa hicc
---

  Once the webcontainer with HICC has been started, point your favorite 
browser to:

---
http://<server>:4080/hicc
---

Troubleshooting Tips

* UNIX Processes For Chukwa Agents

  The Chukwa agent process name is identified by:

---
org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
---

  Command line to use to search for the process name:

---
ps ax | grep org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
---

* UNIX Processes For Chukwa Collectors

  Chukwa Collector name is identified by:

---
org.apache.hadoop.chukwa.datacollection.collector.CollectorStub
---

* UNIX Processes For Chukwa Data Processes

  Chukwa Data Processors are identified by:

---
org.apache.hadoop.chukwa.extraction.demux.Demux
org.apache.hadoop.chukwa.extraction.database.DatabaseLoader
org.apache.hadoop.chukwa.extraction.archive.ChukwaArchiveBuilder
---

  The processes are scheduled execution, therefore they are not always 
visible from the process list.

* Checks For Disk Full 

  If anything is wrong, use /etc/init.d/chukwa-agent and 
CHUKWA_HOME/tools/init.d/chukwa-system-metrics stop to shutdown Chukwa.  
Look at agent.log and collector.log file to determine the problems. 

  The most common problem is the log files are growing unbounded. Set up a 
cron job to remove old log files:

---
 0 12 * * * CHUKWA_HOME/tools/expiration.sh 10 $CHUKWA_HOME/var/log nowait
---

  This will set up the log file expiration for CHUKWA_HOME/var/log for 
log files older than 10 days.

* Emergency Shutdown Procedure

  If the system is not functioning properly and you cannot find an answer in 
the Administration Guide, execute the kill command.  The current state of 
the java process will be written to the log files. You can analyze 
these files to determine the cause of the problem.

---
kill -3 <pid>
---