Release Notes - Hadoop Chukwa - Version 0.2
Overall status
This is the first public release of Chukwa, a log analysis framework on top of Hadoop. Chukwa has been tested at scale, and is reasonably robust and well behaved.
Documentation is still sparse, and error reporting isn't adequately clear. For instructions on setting up Chukwa, see the administration guide.
Requirements
Chukwa relies on Java 1.6
The back-end processing requires Hadoop 0.18+.
Collecting Hadoop logs and metrics requires Hadoop 0.20+.
Bug fixes
Innumerable bugs have been fixed; see the changes file for details.
Major Changes:
-
As of Chukwa 0.2, adaptor IDs are now arbitrary strings, instead of sequentially assigned integers. These IDs can be specified by the user. The agent control protocol has been modified slightly to accommodate this.
-
HICC now supports graphing the results of arbitrary SQL statements.
-
Tools have been added to support bulk-loading old data.
-
Support for processing Chukwa data with Pig
Known Limitations
- HICC defaults to assuming data is UTC; if your machines run on local time, HICC graphs will not display properly until you change the HICC timezone. You can do this by clicking the small "gear" icon on the time selection tool.
- As mentioned in the administration guide, the pig down sampling should run as external command.
-
HDFSUsage script for monitoring hdfs usage in /user, this one needs to run as special hdfs user to access the data. This user should have write access to $CHUKWA_LOG_DIR.
- System metrics collection may fail or be incomplete if your versions of sar and iostat do not match the ones that Chukwa expects. (See also CHUKWA-260)
-
Spill files pile up for JobData in the 19700101 folder because one of the parsed line doesn't contain a timestamp. (CHUKWA-335)
-
The data in some of the chukwa agent metrics monthly, quarterly, yearly, decade tables is wrong. The recordname column holds host data, and the host column holds recordname data.
(CHUKWA-337)