Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Configuration Recipe for monitoring ZooKeeper using Nagios ---------------------------------------------------------- I will start by making the assumption that you already have an working Nagios install. WARNING: I have wrote these instructions while installing and configuring the plugin on my desktop computer running Ubuntu 9.10. I've installed Nagios using apt-get. WARNING: You should customize the config files as suggested in order to match your Nagios and Zookeeper install. WARNING: This README assumes you know how to configure Nagios and how it works. WARNING: You should customize the warning and critical levels on service checks to meet your own needs. 1. Install the plugin $ cp check_zookeeper.py /usr/lib/nagios/plugins/ 2. Install the new commands $ cp zookeeper.cfg /etc/nagios-plugins/config 3. Update the list of servers in zookeeper.cfg for the command 'check_zookeeper' and update the port for the command 'check_zk_node' (default: 2181) 4. Create a virtual host in Nagios used for monitoring the cluster as a whole -OR- Create a hostgroup named 'zookeeper-servers' and add all the zookeeper cluster nodes. 5. Define service checks like I have ilustrated bellow or just use the provided definitions. define service { use generic-service host_name zookeeper-cluster service_description ... check_command check_zookeeper!!! } define service { hostgroup_name zookeeper-servers use generic-service service_description ZK_Open_File_Descriptors_Count check_command check_zk_node!!! } Ex: a. check the number of open file descriptors define service{ use generic-service host_name zookeeper-cluster service_description ZK_Open_File_Descriptor_Count check_command check_zookeeper!zk_open_file_descriptor_count!500!800 } b. check the number of ephemerals nodes define service { use generic-service host_name localhost service_description ZK_Ephemerals_Count check_command check_zookeeper!zk_ephemerals_count!10000!100000 } c. check the number of open file descriptors for each host in the group define service { hostgroup_name zookeeper-servers use generic-service service_description ZK_Open_File_Descriptors_Count check_command check_zk_node!zk_open_file_descriptor_count!500!800 }