2.2. HBase run modes: Standalone and Distributed

HBase has two run modes: Section 2.2.1, “Standalone HBase” and Section 2.2.2, “Distributed”. Out of the box, HBase runs in standalone mode. To set up a distributed deploy, you will need to configure HBase by editing files in the HBase conf directory.

Whatever your mode, you will need to edit conf/hbase-env.sh to tell HBase which java to use. In this file you set HBase environment variables such as the heapsize and other options for the JVM, the preferred location for log files, etc. Set JAVA_HOME to point at the root of your java install.

2.2.1. Standalone HBase

This is the default mode. Standalone mode is what is described in the Section 1.2, “Quick Start” section. In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.

2.2.2. Distributed

Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a pseudo-distributed-- and fully-distributed where the daemons are spread across all nodes in the cluster [9].

Distributed modes require an instance of the Hadoop Distributed File System (HDFS). See the Hadoop requirements and instructions for how to set up a HDFS. Before proceeding, ensure you have an appropriate, working HDFS.

Below we describe the different distributed setups. Starting, verification and exploration of your install, whether a pseudo-distributed or fully-distributed configuration is described in a section that follows, Section 2.2.3, “Running and Confirming Your Installation”. The same verification script applies to both deploy types.

2.2.2.1. Pseudo-distributed

A pseudo-distributed mode is simply a distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.

First, setup your HDFS in pseudo-distributed mode.

Next, configure HBase. Below is an example conf/hbase-site.xml. This is the file into which you add local customizations and overrides for ??? and Section 2.2.2.2.3, “HDFS Client Configuration”. Note that the hbase.rootdir property points to the local HDFS instance.

Now skip to Section 2.2.3, “Running and Confirming Your Installation” for how to start and verify your pseudo-distributed install. [10]

Note

Let HBase create the hbase.rootdir directory. If you don't, you'll get warning saying HBase needs a migration run because the directory is missing files expected by HBase (it'll create them if you let it).

2.2.2.1.1. Pseudo-distributed Configuration File

Below is a sample pseudo-distributed file for the node h-24-30.example.com. hbase-site.xml

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>h-24-30.sfo.stumble.net</value>
  </property>
  ...
</configuration>

2.2.2.1.2. Pseudo-distributed Extras
2.2.2.1.2.1. Startup

To start up the initial HBase cluster...

% bin/start-hbase.sh

To start up an extra backup master(s) on the same server run...

% bin/local-master-backup.sh start 1

... the '1' means use ports 60001 & 60011, and this backup master's logfile will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log.

To startup multiple backup masters run...

% bin/local-master-backup.sh start 2 3

You can start up to 9 backup masters (10 total).

To start up more regionservers...

% bin/local-regionservers.sh start 1

where '1' means use ports 60201 & 60301 and its logfile will be at logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log.

To add 4 more regionservers in addition to the one you just started by running...

% bin/local-regionservers.sh start 2 3 4 5

This supports up to 99 extra regionservers (100 total).

2.2.2.1.2.2. Stop

Assuming you want to stop master backup # 1, run...

% cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9

Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along with the master.

To stop an individual regionserver, run...

% bin/local-regionservers.sh stop 1
	                

2.2.2.2. Fully-distributed

For running a fully-distributed operation on more than one host, make the following configurations. In hbase-site.xml, add the property hbase.cluster.distributed and set it to true and point the HBase hbase.rootdir at the appropriate HDFS NameNode and location in HDFS where you would like HBase to write data. For example, if you namenode were running at namenode.example.org on port 8020 and you wanted to home your HBase in HDFS at /hbase, make the following configuration.

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode.example.org:8020/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
  ...
</configuration>
2.2.2.2.1. regionservers

In addition, a fully-distributed mode requires that you modify conf/regionservers. The Section 2.4.1.2, “regionservers file lists all hosts that you would have running HRegionServers, one host per line (This file in HBase is like the Hadoop slaves file). All servers listed in this file will be started and stopped when HBase cluster start or stop is run.

2.2.2.2.2. ZooKeeper and HBase

See section Chapter 16, ZooKeeper for ZooKeeper setup for HBase.

2.2.2.2.3. HDFS Client Configuration

Of note, if you have made HDFS client configuration on your Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to server-side configurations -- HBase will not see this configuration unless you do one of the following:

  • Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.

  • Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or

  • if only a small set of HDFS client configurations, add them to hbase-site.xml.

An example of such an HDFS client configuration is dfs.replication. If for example, you want to run with a replication factor of 5, hbase will create files with the default of 3 unless you do the above to make the configuration available to HBase.

2.2.3. Running and Confirming Your Installation

Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the mapreduce daemons. These do not need to be started.

If you are managing your own ZooKeeper, start it and confirm its running else, HBase will start up ZooKeeper for you as part of its start process.

Start HBase with the following command:

bin/start-hbase.sh
Run the above from the HBASE_HOME directory.

You should now have a running HBase instance. HBase logs can be found in the logs subdirectory. Check them out especially if HBase had trouble starting.

HBase also puts up a UI listing vital attributes. By default its deployed on the Master host at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational http server at 60030). If the Master were running on a host named master.example.org on the default port, to see the Master's homepage you'd point your browser at http://master.example.org:60010.

Once HBase has started, see the Section 1.2.3, “Shell Exercises” for how to create tables, add data, scan your insertions, and finally disable and drop your tables.

To stop HBase after exiting the HBase shell enter

$ ./bin/stop-hbase.sh
stopping hbase...............

Shutdown can take a moment to complete. It can take longer if your cluster is comprised of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.



[9] The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.

[10] See Section 2.2.2.1.2, “Pseudo-distributed Extras” for notes on how to start extra Masters and RegionServers when running pseudo-distributed.

comments powered by Disqus