HBase Operational Management
This chapter will cover operational tools and practices required of a running HBase cluster.
The subject of operations is related to the topics of , ,
and but is a distinct topic in itself.
HBase Tools and UtilitiesHere we list HBase tools for administration, analysis, fixup, and
debugging.HBase hbckAn fsck for your HBase installTo run hbck against your HBase cluster run
$ ./bin/hbase hbck
At the end of the commands output it prints OK
or INCONSISTENCY. If your cluster reports
inconsistencies, pass -details to see more detail emitted.
If inconsistencies, run hbck a few times because the
inconsistency may be transient (e.g. cluster is starting up or a region is
splitting).
Passing -fix may correct the inconsistency (This latter
is an experimental feature).
HFile ToolSee .WAL ToolsHLog toolThe main method on HLog offers manual
split and dump facilities. Pass it WALs or the product of a split, the
content of the recovered.edits. directory.You can get a textual dump of a WAL file content by doing the
following:$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012The
return code will be non-zero if issues with the file so you can test
wholesomeness of file by redirecting STDOUT to
/dev/null and testing the program return.Similarily you can force a split of a log file directory by
doing: $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/Compression ToolSee .CopyTable
CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
Options:
rs.class hbase.regionserver.class of the peer cluster. Specify if different from current cluster.rs.impl hbase.regionserver.impl of the peer cluster. starttime Beginning of the time range. Without endtime means starttime to forever.endtime End of the time range. Without endtime means starttime to forever.new.name New table's name.peer.adr Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parentfamilies Comma-separated list of ColumnFamilies to copy.
Args:
tablename Name of table to copy.Example of copying 'TestTable' to a cluster that uses replication for a 1 hour window:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--rs.class=org.apache.hadoop.hbase.ipc.ReplicationRegionInterface
--rs.impl=org.apache.hadoop.hbase.regionserver.replication.ReplicationRegionServer
--starttime=1265875194289 --endtime=1265878794289
--peer.adr=server1,server2,server3:2181:/hbase TestTableExportExport is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
ImportImport is a utility that will load data that has been exported back into HBase. Invoke via:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
RowCounterRowCounter is a utility that will count all the rows of a table. This is a good utility to use
as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]
Node ManagementNode DecommissionYou can stop an individual RegionServer by running the following
script in the HBase directory on the particular node:
$ ./bin/hbase-daemon.sh stop regionserver
The RegionServer will first close all regions and then shut itself down.
On shutdown, the RegionServer's ephemeral node in ZooKeeper will expire.
The master will notice the RegionServer gone and will treat it as
a 'crashed' server; it will reassign the nodes the RegionServer was carrying.
Disable the Load Balancer before Decommissioning a nodeIf the load balancer runs while a node is shutting down, then
there could be contention between the Load Balancer and the
Master's recovery of the just decommissioned RegionServer.
Avoid any problems by disabling the balancer first.
See below.
A downside to the above stop of a RegionServer is that regions could be offline for
a good period of time. Regions are closed in order. If many regions on the server, the
first region to close may not be back online until all regions close and after the master
notices the RegionServer's znode gone. In HBase 0.90.2, we added facility for having
a node gradually shed its load and then shutdown itself down. HBase 0.90.2 added the
graceful_stop.sh script. Here is its usage:
$ ./bin/graceful_stop.sh
Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname>
thrift If we should stop/start thrift before/after the hbase stop/start
rest If we should stop/start rest before/after the hbase stop/start
restart If we should restart after graceful stop
reload Move offloaded regions back on to the stopped server
debug Move offloaded regions back on to the stopped server
hostname Hostname of server we are to stop
To decommission a loaded RegionServer, run the following:
$ ./bin/graceful_stop.sh HOSTNAME
where HOSTNAME is the host carrying the RegionServer
you would decommission.
On HOSTNAMEThe HOSTNAME passed to graceful_stop.sh
must match the hostname that hbase is using to identify RegionServers.
Check the list of RegionServers in the master UI for how HBase is
referring to servers. Its usually hostname but can also be FQDN.
Whatever HBase is using, this is what you should pass the
graceful_stop.sh decommission
script. If you pass IPs, the script is not yet smart enough to make
a hostname (or FQDN) of it and so it will fail when it checks if server is
currently running; the graceful unloading of regions will not run.
The graceful_stop.sh script will move the regions off the
decommissioned RegionServer one at a time to minimize region churn.
It will verify the region deployed in the new location before it
will moves the next region and so on until the decommissioned server
is carrying zero regions. At this point, the graceful_stop.sh
tells the RegionServer stop. The master will at this point notice the
RegionServer gone but all regions will have already been redeployed
and because the RegionServer went down cleanly, there will be no
WAL logs to split.
Load Balancer
It is assumed that the Region Load Balancer is disabled while the
graceful_stop script runs (otherwise the balancer
and the decommission script will end up fighting over region deployments).
Use the shell to disable the balancer:
hbase(main):001:0> balance_switch false
true
0 row(s) in 0.3590 seconds
This turns the balancer OFF. To reenable, do:
hbase(main):001:0> balance_switch true
false
0 row(s) in 0.3590 secondsRolling Restart
You can also ask this script to restart a RegionServer after the shutdown
AND move its old regions back into place. The latter you might do to
retain data locality. A primitive rolling restart might be effected by
running something like the following:
$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
Tail the output of /tmp/log.txt to follow the scripts
progress. The above does RegionServers only. Be sure to disable the
load balancer before doing the above. You'd need to do the master
update separately. Do it before you run the above script.
Here is a pseudo-script for how you might craft a rolling restart script:
Untar your release, make sure of its configuration and
then rsync it across the cluster. If this is 0.90.2, patch it
with HBASE-3744 and HBASE-3756.
Run hbck to ensure the cluster consistent
$ ./bin/hbase hbck
Effect repairs if inconsistent.
Restart the Master: $ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master
Disable the region balancer:$ echo "balance_switch false" | ./bin/hbase shellRun the graceful_stop.sh script per RegionServer. For example:
$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
If you are running thrift or rest servers on the RegionServer, pass --thrift or --rest options (See usage
for graceful_stop.sh script).
Restart the Master again. This will clear out dead servers list and reenable the balancer.
Run hbck to ensure the cluster is consistent.
MetricsMetric SetupSee Metrics for
an introduction and how to enable Metrics emission.
RegionServer Metricshbase.regionserver.blockCacheCountBlock cache item count in memory. This is the number of blocks of storefiles (HFiles) in the cache.hbase.regionserver.blockCacheFreeBlock cache memory available (bytes).hbase.regionserver.blockCacheHitRatioBlock cache hit ratio (0 to 100). TODO: describe impact to ratio where read requests that have cacheBlocks=falsehbase.regionserver.blockCacheSizeBlock cache size in memory (bytes). i.e., memory in use by the BlockCachehbase.regionserver.compactionQueueSizeSize of the compaction queue. This is the number of stores in the region that have been targeted for compaction.hbase.regionserver.fsReadLatency_avg_timeFilesystem read latency (ms). This is the average time to read from HDFS.hbase.regionserver.fsReadLatency_num_opsTODOhbase.regionserver.fsSyncLatency_avg_timeFilesystem sync latency (ms)hbase.regionserver.fsSyncLatency_num_opsTODOhbase.regionserver.fsWriteLatency_avg_timeFilesystem write latency (ms)hbase.regionserver.fsWriteLatency_num_opsTODOhbase.regionserver.memstoreSizeMBSum of all the memstore sizes in this RegionServer (MB)hbase.regionserver.regionsNumber of regions served by the RegionServerhbase.regionserver.requestsTotal number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request will constitute 1 request per HFile.hbase.regionserver.storeFileIndexSizeMBSum of all the storefile index sizes in this RegionServer (MB)hbase.regionserver.storesNumber of stores open on the RegionServer. A store corresponds to a column family. For example, if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that column family. hbase.regionserver.storeFilesNumber of store filles open on the RegionServer. A store may have more than one storefile (HFile).HBase MonitoringTODO
Cluster ReplicationSee Cluster Replication.
HBase BackupThere are two broad strategies for performing HBase backups: backing up with a full cluster shutdown, and backing up on a live cluster.
Each approach has pros and cons.
For additional information, see HBase Backup Options over on the Sematext Blog.
Full Shutdown BackupSome environments can tolerate a periodic full shutdown of their HBase cluster, for example if it is being used a back-end analytic capacity
and not serving front-end web-pages. The benefits are that the NameNode/Master are RegionServers are down, so there is no chance of missing
any in-flight changes to either StoreFiles or metadata. The obvious con is that the cluster is down. The steps include:
Stop HBaseDistcpDistcp could be used to either copy the contents of the HBase directory in HDFS to either the same cluster in another directory, or
to a different cluster.
Note: Distcp works in this situation because the cluster is down and there are no in-flight edits to files.
Distcp-ing of files in the HBase directory is not generally recommended on a live cluster.
Restore (if needed)The backup of the hbase directory from HDFS is copied onto the 'real' hbase directory via distcp. The act of copying these files
creates new HDFS metadata, which is why a restore of the NameNode edits from the time of the HBase backup isn't required for this kind of
restore, because it's a restore (via distcp) of a specific HDFS directory (i.e., the HBase part) not the entire HDFS file-system.
Live Cluster Backup - ReplicationThis approach assumes that there is a second cluster.
See the HBase page on replication for more information.
Live Cluster Backup - CopyTableThe utility could either be used to copy data from one table to another on the
same cluster, or to copy data to another table on another cluster.
Since the cluster is up, there is a risk that edits could be missed in the copy process.
Live Cluster Backup - ExportThe approach dumps the content of a table to HDFS on the same cluster. To restore the data, the
utility would be used.
Since the cluster is up, there is a risk that edits could be missed in the export process.
Capacity PlanningStorageA common question for HBase administrators is estimating how much storage will be required for an HBase cluster.
There are several apsects to consider, the most important of which is what data load into the cluster. Start
with a solid understanding of how HBase handles data internally (KeyValue).
KeyValueHBase storage will be dominated by KeyValues. See and for
how HBase stores data internally.
It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the
rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other
factor.
StoreFiles and BlocksKeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis.
Blocks are aggregated into StoreFile's. See .
HDFS Block ReplicationBecause HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.
RegionsAnother common question for HBase administrators is determining the right number of regions per
RegionServer. This affects both storage and hardware planning. See .