14.9. Capacity Planning

14.9.1. Storage

A common question for HBase administrators is estimating how much storage will be required for an HBase cluster. There are several apsects to consider, the most important of which is what data load into the cluster. Start with a solid understanding of how HBase handles data internally (KeyValue).

14.9.1.1. KeyValue

HBase storage will be dominated by KeyValues. See Section 9.7.5.4, “KeyValue” and Section 6.3.2, “Try to minimize row and column sizes” for how HBase stores data internally.

It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other factor.

14.9.1.2. StoreFiles and Blocks

KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis. Blocks are aggregated into StoreFile's. See Section 9.7, “Regions”.

14.9.1.3. HDFS Block Replication

Because HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.

14.9.2. Regions

Another common question for HBase administrators is determining the right number of regions per RegionServer. This affects both storage and hardware planning. See Section 11.4.1, “Number of Regions”.

comments powered by Disqus