In his presentation, Avoiding
Full GCs with MemStore-Local Allocation Buffers, Todd Lipcon
describes two cases of stop-the-world garbage collections common in
HBase, especially during loading; CMS failure modes and old generation
heap fragmentation brought. To address the first, start the CMS
earlier than default by adding
-XX:CMSInitiatingOccupancyFraction
and setting it down
from defaults. Start at 60 or 70 percent (The lower you bring down the
threshold, the more GCing is done, the more CPU used). To address the
second fragmentation issue, Todd added an experimental facility,
, that
must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in
Apache 0.92.x HBase). See hbase.hregion.memstore.mslab.enabled
to true in your Configuration
. See the cited
slides for background and detail[24].
Be aware that when enabled, each MemStore instance will occupy at least
an MSLAB instance of memory. If you have thousands of regions or lots
of regions each with many column families, this allocation of MSLAB
may be responsible for a good portion of your heap allocation and in
an extreme case cause you to OOME. Disable MSLAB in this case, or
lower the amount of memory it uses or float less regions per server.
For more information about GC logs, see Section 12.2.3, “JVM Garbage Collection Logs”.
[24] The latest jvms do better regards fragmentation so make sure you are running a recent release. Read down in the message, Identifying concurrent mode failures caused by fragmentation.