HRegionServer
is the RegionServer implementation. It is responsible for serving and managing regions.
In a distributed cluster, a RegionServer runs on a Section 9.9.2, “DataNode”.
The methods exposed by HRegionRegionInterface
contain both data-oriented and region-maintenance methods:
For example, when the HBaseAdmin
method majorCompact
is invoked on a table, the client is actually iterating through
all regions for the specified table and requesting a major compaction directly to each region.
The RegionServer runs a variety of background threads:
Coprocessors were added in 0.92. There is a thorough Blog Overview of CoProcessors posted. Documentation will eventually move to this reference guide, but the blog is the most current information available at this time.
The Block Cache is an LRU cache that contains three levels of block priority to allow for scan-resistance and in-memory ColumnFamilies:
For more information, see the LruBlockCache source
Block caching is enabled by default for all the user tables which means that any read operation will load the LRU cache. This might be good for a large number of use cases, but further tunings are usually required in order to achieve better performance. An important concept is the working set size, or WSS, which is: "the amount of memory needed to compute the answer to a problem". For a website, this would be the data that's needed to answer the queries over a short amount of time.
The way to calculate how much memory is available in HBase for caching is:
number of region servers * heap size * hfile.block.cache.size * 0.85
The default value for the block cache is 0.25 which represents 25% of the available heap. The last value (85%) is the default acceptable loading factor in the LRU cache after which eviction is started. The reason it is included in this equation is that it would be unrealistic to say that it is possible to use 100% of the available memory since this would make the process blocking from the point where it loads new blocks. Here are some examples:
Your data isn't the only resident of the block cache, here are others that you may have to take into account:
Currently the recommended way to measure HFile indexes and bloom filters sizes is to look at the region server web UI and checkout the relevant metrics. For keys, sampling can be done by using the HFile command line tool and look for the average key size metric.
It's generally bad to use block caching when the WSS doesn't fit in memory. This is the case when you have for example 40GB available across all your region servers' block caches but you need to process 1TB of data. One of the reasons is that the churn generated by the evictions will trigger more garbage collections unnecessarily. Here are two use cases:
Each RegionServer adds updates (Puts, Deletes) to its write-ahead log (WAL) first, and then to the Section 9.7.5.1, “MemStore” for the affected Section 9.7.5, “Store”. This ensures that HBase has durable writes. Without WAL, there is the possibility of data loss in the case of a RegionServer failure before each MemStore is flushed and new StoreFiles are written. HLog is the HBase WAL implementation, and there is one HLog instance per RegionServer.
The WAL is in HDFS in/hbase/.logs/
with subdirectories per region.
For more general information about the concept of write ahead logs, see the Wikipedia Write-Ahead Log article.
When a RegionServer crashes, it will lose its ephemeral lease in ZooKeeper...TODO
When set to true
, any error
encountered splitting will be logged, the problematic WAL will be
moved into the .corrupt
directory under the hbase
rootdir
, and processing will continue. If set to
false
, the default, the exception will be propagated and the
split logged as failed.[22]
If we get an EOF while splitting logs, we proceed with the split
even when hbase.hlog.split.skip.errors
==
false
. An EOF while reading the last log in the
set of files to split is near-guaranteed since the RegionServer likely
crashed mid-write of a record. But we'll continue even if we got an
EOF reading other than the last file in the set.[23]
[22] See HBASE-2958 When hbase.hlog.split.skip.errors is set to false, we fail the split but thats it. We need to do more than just fail split if this flag is set.
[23] For background, see HBASE-2643 Figure how to deal with eof splitting logs