dfs.datanode.max.xcievers
).
Default is 256. Up this limit on your hadoop cluster.
C:\cygwin
you
should modify the following appropriately.
For additional information, see the Hadoop Quick Start GuideHOME=c:\cygwin\home\jim ANT_HOME=(wherever you installed ant) JAVA_HOME=(wherever you installed java) PATH=C:\cygwin\bin;%JAVA_HOME%\bin;%ANT_HOME%\bin; other windows stuff SHELL=/bin/bash
What follows presumes you have obtained a copy of HBase, see Releases, and are installing for the first time. If upgrading your HBase instance, see Upgrading.
Three modes are described: standalone, pseudo-distributed (where all servers are run on a single host), and distributed. If new to hbase start by following the standalone instruction.
Whatever your mode, define ${HBASE_HOME}
to be the location of the root of your HBase installation, e.g.
/user/local/hbase
. Edit ${HBASE_HOME}/conf/hbase-env.sh
. In this file you can
set the heapsize for HBase, etc. At a minimum, set JAVA_HOME
to point at the root of
your Java installation.
If you are running a standalone operation, there should be nothing further to configure; proceed to Running and Confirming Your Installation. If you are running a distributed operation, continue reading.
Distributed mode requires an instance of the Hadoop Distributed File System (DFS). See the Hadoop requirements and instructions for how to set up a DFS.
A pseudo-distributed operation is simply a distributed operation run on a single host.
Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of
${HBASE_HOME}/conf/hbase-site.xml
, which needs to be pointed at the running Hadoop DFS instance.
Use hbase-site.xml
to override the properties defined in
${HBASE_HOME}/conf/hbase-default.xml
(hbase-default.xml
itself
should never be modified). At a minimum the hbase.rootdir
property should be redefined
in hbase-site.xml
to point HBase at the Hadoop filesystem to use. For example, adding the property
below to your hbase-site.xml
says that HBase should use the /hbase
directory in the
HDFS whose namenode is at port 9000 on your local machine:
<configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> <description>The directory shared by region servers. </description> </property> ... </configuration>
Note: Let hbase create the directory. If you don't, you'll get warning saying hbase needs a migration run because the directory is missing files expected by hbase (it'll create them if you let it).
For running a fully-distributed operation on more than one host, the following configurations must be made in addition to those described in the pseudo-distributed operation section above. In this mode, a ZooKeeper cluster is required.
In hbase-site.xml
, set hbase.cluster.distributed
to 'true'.
<configuration> ... <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description> </property> ... </configuration>
In fully-distributed operation, you probably want to change your hbase.rootdir
from localhost to the name of the node running the HDFS namenode. In addition
to hbase-site.xml
changes, a fully-distributed operation requires that you
modify ${HBASE_HOME}/conf/regionservers
.
The regionserver
file lists all hosts running HRegionServers, one host per line
(This file in HBase is like the hadoop slaves file at ${HADOOP_HOME}/conf/slaves
).
A distributed HBase depends on a running ZooKeeper cluster.
HBase can manage a ZooKeeper cluster for you, or you can manage it on your own
and point HBase to it.
To toggle this option, use the HBASE_MANAGES_ZK
variable in
${HBASE_HOME}/conf/hbase-env.sh
.
This variable, which defaults to true
, tells HBase whether to
start/stop the ZooKeeper quorum servers alongside the rest of the servers.
To point HBase at an existing ZooKeeper cluster, add your zoo.cfg
to the CLASSPATH
.
HBase will see this file and use it to figure out where ZooKeeper is.
Additionally set HBASE_MANAGES_ZK
in ${HBASE_HOME}/conf/hbase-env.sh
to false
so that HBase doesn't mess with your ZooKeeper setup:
... # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=falseFor more information about setting up a ZooKeeper cluster on your own, see the ZooKeeper Getting Started Guide. HBase currently uses ZooKeeper version 3.2.0, so any cluster setup with a 3.x.x version of ZooKeeper should work.
To have HBase manage the ZooKeeper cluster, you can use a zoo.cfg
file as above, or edit the options directly in the ${HBASE_HOME}/conf/hbase-site.xml
.
Every option from the zoo.cfg
has a corresponding property in the
XML configuration file named hbase.zookeeper.property.OPTION
.
For example, the clientPort
setting in ZooKeeper can be changed by
setting the hbase.zookeeper.property.clientPort
property.
For the full list of available properties, see ZooKeeper's zoo.cfg
.
For the default values used by HBase, see ${HBASE_HOME}/conf/hbase-default.xml
.
At minimum, you should set the list of servers that you want ZooKeeper to run
on using the hbase.zookeeper.quorum
property.
This property defaults to localhost
which is not suitable for a
fully distributed HBase.
It is recommended to run a ZooKeeper quorum of 5 or 7 machines, and give each
server around 1GB to ensure that they don't swap.
It is also recommended to run the ZooKeeper servers on separate machines from
the Region Servers with their own disks.
If this is not easily doable for you, choose 5 of your region servers to run the
ZooKeeper servers on.
As an example, to have HBase manage a ZooKeeper quorum on nodes rs{1,2,3,4,5}.example.com, bound to port 2222 (the default is 2181), use:
${HBASE_HOME}/conf/hbase-env.sh: ... # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=true ${HBASE_HOME}/conf/hbase-site.xml: <configuration> ... <property> <name>hbase.zookeeper.property.clientPort</name> <value>2222</value> <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect. </description> </property> ... <property> <name>hbase.zookeeper.quorum</name> <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value> <description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description> </property> ... </configuration>
When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part of the regular start/stop scripts. If you would like to run it yourself, you can do:
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeperNote that you can use HBase in this manner to spin up a ZooKeeper cluster, unrelated to HBase. Just make sure to set
HBASE_MANAGES_ZK
to
false
if you want it to stay up so that when HBase shuts down it
doesn't take ZooKeeper with it.
Of note, if you have made HDFS client configuration on your hadoop cluster, HBase will not see this configuration unless you do one of the following:
HADOOP_CONF_DIR
to CLASSPATH
in hbase-env.sh
hdfs-site.xml
(or hadoop-site.xml
) to ${HBASE_HOME}/conf
, orhbase-site.xml
dfs.replication
. If for example,
you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
you do the above to make the configuration available to HBase.
If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.
If you are running a distributed cluster you will need to start the Hadoop DFS daemons and ZooKeeper Quorum before starting HBase and stop the daemons after HBase has shut down.
Start and
stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh
.
You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.
HBase does not normally use the mapreduce daemons. These do not need to be started.
Start up your ZooKeeper cluster.
Start HBase with the following command:
${HBASE_HOME}/bin/start-hbase.sh
Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell
to obtain a
shell against HBase from which you can execute commands.
Test your installation by creating, viewing, and dropping
To stop HBase, exit the HBase shell and enter:
${HBASE_HOME}/bin/stop-hbase.sh
If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.
The default location for logs is ${HBASE_HOME}/logs
.
HBase also puts up a UI listing vital attributes. By default its deployed on the master host at port 60010 (HBase regionservers listen on port 60020 by default and put up an informational http server at 60030).
After installing a new HBase on top of data written by a previous HBase version, before
starting your cluster, run the ${HBASE_DIR}/bin/hbase migrate
migration script.
It will make any adjustments to the filesystem data under hbase.rootdir
necessary to run
the HBase version. It does not change your install unless you explicitly ask it to.
If your client is NOT Java, consider the Thrift or REST libraries.