HBase

What follows presumes you are installing HBase for the first time. If upgrading your HBase instance, see Upgrading.

Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can set the heapsize for HBase, etc. At a minimum, set JAVA_HOME to point at the root of your Java installation.

If you are running a standalone operation, there should be nothing further to configure; proceed to Running and Confirming Your Installation. If you are running a distributed operation, continue reading.

Distributed Operation

Distributed mode requires an instance of the Hadoop Distributed File System (DFS). See the Hadoop requirements and instructions for how to set up a DFS.

Once you have confirmed your DFS setup, configuring HBase requires modification of the following two files: ${HBASE_HOME}/conf/hbase-site.xml and ${HBASE_HOME}/conf/regionservers. The former needs to be pointed at the running Hadoop DFS instance. The latter file lists all the members of the HBase cluster.

Use hbase-site.xml to override the properties defined in ${HBASE_HOME}/conf/hbase-default.xml (hbase-default.xml itself should never be modified). At a minimum the hbase.master and the hbase.rootdir properties should be redefined in hbase-site.xml to configure the host:port pair on which the HMaster runs (read about the HBase master, regionservers, etc) and to point HBase at the Hadoop filesystem to use. For example, adding the below to your hbase-site.xml says the master is up on port 60000 on the host example.org and that HBase should use the /hbase directory in the HDFS whose namenode is at port 9000, again on example.org:

The regionserver file lists all the hosts running HRegionServers, one host per line (This file is HBase synonym of the hadoop slaves file at ${HADOOP_HOME}/conf/slaves).

Running and Confirming Your Installation

If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.

If you are running a distributed cluster you will need to start the Hadoop DFS daemons before starting HBase and stop the daemons after HBase has shut down. Start and stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh. Ensure it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the mapreduce daemons. These do not need to be started.

Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell to obtain a shell against HBase from which you can execute HQL commands (HQL is a severe subset of SQL). In the HBase shell, type help; to see a list of supported HQL commands. Note that all commands in the HBase shell must end with ;. Test your installation by creating, viewing, and dropping a table, as per the help instructions. Be patient with the create and drop operations as they may each take 10 seconds or more. To stop HBase, exit the HBase shell and enter:

If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.

HBase also puts up a UI listing vital attributes. By default its deployed on the master host at port 60010 (HBase regionservers listen on port 60020 by default and put up an informational http server at 60030).

Upgrading

After installing a new HBase on top of data written by a previous HBase version, before starting your cluster, run the ${HBASE_DIR}/bin/hbase migrate migration script. It will make any adjustments to the filesystem data under hbase.rootdir necessary to run the HBase version. It does not change your install unless you explicitly ask it to.

Example API Usage

Once you have a running HBase, you probably want a way to hook your application up to it. If your application is in Java, then you should use the Java API. Here's an example of what a simple client might look like. This example assumes that you've created a table called "myTable" with a column family called "myColumnFamily".

import org.apache.hadoop.hbase.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HStoreKey;
import org.apache.hadoop.hbase.HScannerInterface;
import org.apache.hadoop.io.Text;
import java.io.IOException;

public class MyClient {

  public static void main(String args[]) throws IOException {
    // You need a configuration object to tell the client where to connect.
    // But don't worry, the defaults are pulled from the local config file.
    HBaseConfiguration config = new HBaseConfiguration();

    // This instantiates an HTable object that connects you to the "myTable"
    // table. 
    HTable table = new HTable(config, new Text("myTable"));

    // Tell the table that you'll be updating row "myRow". The lockId you get
    // back uniquely identifies your batch of updates. (Note, however, that 
    // only one update can be in progress at a time. This is fixed in HBase
    // version 0.2.0.)
    long lockId = table.startUpdate(new Text("myRow"));

    // The HTable#put method takes the lockId you got from startUpdate, a Text
    // that describes what cell you want to put a value into, and a byte array
    // that is the value you want to store. Note that if you want to store 
    // strings, you have to getBytes() from the string for HBase to understand
    // how to store it. (The same goes for primitives like ints and longs and
    // user-defined classes - you must find a way to reduce it to bytes.)
    table.put(lockId, new Text("myColumnFamily:columnQualifier1"), 
      "columnQualifier1 value!".getBytes());

    // Deletes are batch operations in HBase as well. 
    table.delete(lockId, new Text("myColumnFamily:cellIWantDeleted"));

    // Once you've done all the puts you want, you need to commit the results.
    // The HTable#commit method takes the lockId that you got from startUpdate
    // and pushes the batch of changes you made into HBase.
    table.commit(lockId);

    // Alternately, if you decide that you don't want the changes you've been
    // accumulating anymore, you can use the HTable#abort method.
    // table.abort(lockId);

    // Now, to retrieve the data we just wrote. Just like when we store them,
    // the values that come back are byte arrays. If you happen to know that
    // the value contained is a string and want an actual string, then you 
    // must convert it yourself.
    byte[] valueBytes = table.get(new Text("myRow"), 
      new Text("myColumnFamily:columnQualifier1"));
    String valueStr = new String(valueBytes);
    
    // Sometimes, you won't know the row you're looking for. In this case, you
    // use a Scanner. This will give you cursor-like interface to the contents
    // of the table.
    HStoreKey row = new HStoreKey();
    SortedMap columns = new TreeMap();
    HScannerInterface scanner = 
      // we want to get back only "myColumnFamily:columnQualifier1" when we iterate
      table.obtainScanner(new Text[]{new Text("myColumnFamily:columnQualifier1")}, 
      // we want to start scanning from an empty Text, meaning the beginning of
      // the table
      new Text(""));
      
    // Now, for the actual iteration.
    while(scanner.next(row, columns)) {
      // print out the row we found and the columns we were looking for
      System.out.println("Found row: " + row.getRow() + " with value: " +
       new String(columns.get("myColumnFamily:columnQualifier1")));
    }
  }
}

There are many other methods for putting data into and getting data out of HBase, but these examples should get you started. See the HTable javadoc for more methods. Additionally, there are methods for managing tables in the HBaseAdmin class.

If your client is NOT Java, then you should consider the Thrift or REST libraries.

Requirements

Getting Started

Distributed Operation

Running and Confirming Your Installation

Upgrading

Example API Usage

Related Documentation