Chukwa 0.4 -- April 2010 This is the second formal release of Chukwa, an Apache Hadoop subproject dedicated to scalable log collection and processing. If you have large volumes of log data generated across a cluster, and you need to process them with MapReduce, Chukwa may be the tool for you. The notes for this release are in docs/releasenotes.html BUILDING CHUKWA To build chukwa from source: In the Chukwa root directory, say 'ant', and then 'cp build/*.jar build/*.war .' To check that things are ok, run 'ant test'. It should take roughly fifteen minutes. RUNNING CHUKWA If you are unfamiliar with Chukwa, you should start by reading the design overview, in docs/design.html. This will tell you what each piece of Chukwa does. If you're impatient, the following is the 30-second explanation: The minimum you need to run Chukwa are agents on each machine you're monitoring, and a collector to write the collected data to HDFS. The basic command to start an agent is bin/chukwa agent. The base command to start a collector is bin/chukwa collector. If you want to start a bunch of agents, you can use the bin/start-agents.sh script. This just uses ssh to start agents on a list of machines, given in conf/agents. It's exactly parallel to Hadoop's start-hdfs and start-mapred scripts. There's also a bin/start-collectors.sh that does the same to start collectors, on machines listed in conf/collectors. One hostname per line. There are stop scripts that do the exact opposite of the start commands. Full installation instructions are in docs/admin.html.