dfs.datanode.max.xcievers
).
Default is 256. If loading lots of data into hbase, up this limit on your
hadoop cluster. Also consider upping the number of datanode handlers from
the default of 3. See dfs.datanode.handler.count
.What follows presumes you have obtained a copy of HBase and are installing for the first time. If upgrading your HBase instance, see Upgrading.
Define ${HBASE_HOME}
to be the location of the root of your HBase installation, e.g.
/user/local/hbase
. Edit ${HBASE_HOME}/conf/hbase-env.sh
. In this file you can
set the heapsize for HBase, etc. At a minimum, set JAVA_HOME
to point at the root of
your Java installation.
If you are running a standalone operation, there should be nothing further to configure; proceed to Running and Confirming Your Installation. If you are running a distributed operation, continue reading.
Distributed mode requires an instance of the Hadoop Distributed File System (DFS) and a ZooKeeper cluster. See the Hadoop requirements and instructions for how to set up a DFS. See the ZooKeeeper Getting Started Guide for information about the ZooKeeper distributed coordination service. If you do not configure a ZooKeeper cluster, HBase will manage a single instance ZooKeeper service for you running on the master node. This is intended for development and local testing only. It SHOULD NOT be used in a fully-distributed production operation.
A pseudo-distributed operation is simply a distributed operation run on a single host.
Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of
${HBASE_HOME}/conf/hbase-site.xml
, which needs to be pointed at the running Hadoop DFS instance.
Use hbase-site.xml
to override the properties defined in
${HBASE_HOME}/conf/hbase-default.xml
(hbase-default.xml
itself
should never be modified). At a minimum the hbase.rootdir
property should be redefined
in hbase-site.xml
to point HBase at the Hadoop filesystem to use. For example, adding the property
below to your hbase-site.xml
says that HBase should use the /hbase
directory in the
HDFS whose namenode is at port 9000 on your local machine:
<configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> <description>The directory shared by region servers. </description> </property> ... </configuration>
Note: Let hbase create the directory. If you don't, you'll get warning saying hbase needs a migration run because the directory is missing files expected by hbase (it'll create them if you let it).
hbase-site.xml
, you must also configure hbase.master
to the
host:port
pair on which the HMaster runs
(read about the HBase master,
regionservers, etc). For example, adding the below to your hbase-site.xml
says the
master is up on port 60000 on the host example.org:
<configuration> ... <property> <name>hbase.master</name> <value>example.org:60000</value> <description>The host and port that the HBase master runs at. </description> </property> ... </configuration>
Keep in mind that for a fully-distributed operation, you may not want your hbase.rootdir
to point to localhost (maybe, as in the configuration above, you will want to use
example.org
). In addition to hbase-site.xml
, a fully-distributed
operation requires that you also modify ${HBASE_HOME}/conf/regionservers
.
regionserver
lists all the hosts running HRegionServers, one host per line (This file
in HBase is like the hadoop slaves file at ${HADOOP_HOME}/conf/slaves
).
Furthermore, you should configure a distributed ZooKeeper cluster.
The ZooKeeper configuration file is stored at ${HBASE_HOME}/conf/zoo.cfg
.
See the ZooKeeper Getting Started Guide for information about the format and options of that file.
Specifically, look at the Running Replicated ZooKeeper section.
In ${HBASE_HOME}/conf/hbase-env.sh
, set HBASE_MANAGES_ZK=false
to tell HBase not to manage its own single instance ZooKeeper service.
Of note, if you have made HDFS client configuration on your hadoop cluster, hbase will not see this configuration unless you do one of the following:
HADOOP_CONF_DIR
to CLASSPATH
in hbase-env.sh
hadoop-site.xml
to ${HBASE_HOME}/conf
, orhbase-site.xml
dfs.replication
. If for example,
you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
you do the above to make the configuration available to hbase.
If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.
If you are running a distributed cluster you will need to start the Hadoop DFS daemons
before starting HBase and stop the daemons after HBase has shut down. Start and
stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh
.
You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.
HBase does not normally use the mapreduce daemons. These do not need to be started.
Start HBase with the following command:
${HBASE_HOME}/bin/start-hbase.sh
Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell
to obtain a
shell against HBase from which you can execute commands.
Test your installation by creating, viewing, and dropping
To stop HBase, exit the HBase shell and enter:
${HBASE_HOME}/bin/stop-hbase.sh
If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.
The default location for logs is ${HBASE_HOME}/logs
.
HBase also puts up a UI listing vital attributes. By default its deployed on the master host at port 60010 (HBase regionservers listen on port 60020 by default and put up an informational http server at 60030).
After installing a new HBase on top of data written by a previous HBase version, before
starting your cluster, run the ${HBASE_DIR}/bin/hbase migrate
migration script.
It will make any adjustments to the filesystem data under hbase.rootdir
necessary to run
the HBase version. It does not change your install unless you explicitly ask it to.
Once you have a running HBase, you probably want a way to hook your application up to it. If your application is in Java, then you should use the Java API. Here's an example of what a simple client might look like. This example assumes that you've created a table called "myTable" with a column family called "myColumnFamily".
import java.io.IOException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.io.BatchUpdate;
import org.apache.hadoop.hbase.io.Cell;
import org.apache.hadoop.hbase.io.RowResult;
import org.apache.hadoop.hbase.util.Bytes;
public class MyClient {
public static void main(String args[]) throws IOException {
// You need a configuration object to tell the client where to connect.
// But don't worry, the defaults are pulled from the local config file.
HBaseConfiguration config = new HBaseConfiguration();
// This instantiates an HTable object that connects you to the "myTable"
// table.
HTable table = new HTable(config, "myTable");
// To do any sort of update on a row, you use an instance of the BatchUpdate
// class. A BatchUpdate takes a row and optionally a timestamp which your
// updates will affect. If no timestamp, the server applies current time
// to the edits.
BatchUpdate batchUpdate = new BatchUpdate("myRow");
// The BatchUpdate#put method takes a byte [] (or String) that designates
// what cell you want to put a value into, and a byte array that is the
// value you want to store. Note that if you want to store Strings, you
// have to getBytes() from the String for HBase to store it since HBase is
// all about byte arrays. The same goes for primitives like ints and longs
// and user-defined classes - you must find a way to reduce it to bytes.
// The Bytes class from the hbase util package has utility for going from
// String to utf-8 bytes and back again and help for other base types.
batchUpdate.put("myColumnFamily:columnQualifier1",
Bytes.toBytes("columnQualifier1 value!"));
// Deletes are batch operations in HBase as well.
batchUpdate.delete("myColumnFamily:cellIWantDeleted");
// Once you've done all the puts you want, you need to commit the results.
// The HTable#commit method takes the BatchUpdate instance you've been
// building and pushes the batch of changes you made into HBase.
table.commit(batchUpdate);
// Now, to retrieve the data we just wrote. The values that come back are
// Cell instances. A Cell is a combination of the value as a byte array and
// the timestamp the value was stored with. If you happen to know that the
// value contained is a string and want an actual string, then you must
// convert it yourself.
Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
// This could throw a NullPointerException if there was no value at the cell
// location.
String valueStr = Bytes.toString(cell.getValue());
// Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents
// of the table.
Scanner scanner =
// we want to get back only "myColumnFamily:columnQualifier1" when we iterate
table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
// Scanners return RowResult instances. A RowResult is like the
// row key and the columns all wrapped up in a single Object.
// RowResult#getRow gives you the row key. RowResult also implements
// Map, so you can get to your column results easily.
// Now, for the actual iteration. One way is to use a while loop like so:
RowResult rowResult = scanner.next();
while (rowResult != null) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + Bytes.toString(rowResult.getRow()) +
" with value: " + rowResult.get(Bytes.toBytes("myColumnFamily:columnQualifier1")));
rowResult = scanner.next();
}
// The other approach is to use a foreach loop. Scanners are iterable!
for (RowResult result : scanner) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + Bytes.toString(rowResult.getRow()) +
" with value: " + rowResult.get(Bytes.toBytes("myColumnFamily:columnQualifier1")));
}
// Make sure you close your scanners when you are done!
// Its probably best to put the iteration into a try/finally with the below
// inside the finally clause.
scanner.close();
}
}
There are many other methods for putting data into and getting data out of HBase, but these examples should get you started. See the HTable javadoc for more methods. Additionally, there are methods for managing tables in the HBaseAdmin class.
If your client is NOT Java, then you should consider the Thrift or REST libraries.