Chapter 11. Apache HBase (TM) Performance Tuning

Table of Contents

11.1. Operating System
11.1.1. Memory
11.1.2. 64-bit
11.1.3. Swapping
11.2. Network
11.2.1. Single Switch
11.2.2. Multiple Switches
11.2.3. Multiple Racks
11.2.4. Network Interfaces
11.3. Java
11.3.1. The Garbage Collector and Apache HBase
11.4. HBase Configurations
11.4.1. Number of Regions
11.4.2. Managing Compactions
11.4.3. hbase.regionserver.handler.count
11.4.4. hfile.block.cache.size
11.4.5. hbase.regionserver.global.memstore.upperLimit
11.4.6. hbase.regionserver.global.memstore.lowerLimit
11.4.7. hbase.hstore.blockingStoreFiles
11.4.8. hbase.hregion.memstore.block.multiplier
11.4.9. hbase.regionserver.checksum.verify
11.5. ZooKeeper
11.6. Schema Design
11.6.1. Number of Column Families
11.6.2. Key and Attribute Lengths
11.6.3. Table RegionSize
11.6.4. Bloom Filters
11.6.5. ColumnFamily BlockSize
11.6.6. In-Memory ColumnFamilies
11.6.7. Compression
11.7. Writing to HBase
11.7.1. Batch Loading
11.7.2. Table Creation: Pre-Creating Regions
11.7.3. Table Creation: Deferred Log Flush
11.7.4. HBase Client: AutoFlush
11.7.5. HBase Client: Turn off WAL on Puts
11.7.6. HBase Client: Group Puts by RegionServer
11.7.7. MapReduce: Skip The Reducer
11.7.8. Anti-Pattern: One Hot Region
11.8. Reading from HBase
11.8.1. Scan Caching
11.8.2. Scan Attribute Selection
11.8.3. Avoid scan seeks
11.8.4. MapReduce - Input Splits
11.8.5. Close ResultScanners
11.8.6. Block Cache
11.8.7. Optimal Loading of Row Keys
11.8.8. Concurrency: Monitor Data Spread
11.8.9. Bloom Filters
11.9. Deleting from HBase
11.9.1. Using HBase Tables as Queues
11.9.2. Delete RPC Behavior
11.10. HDFS
11.10.1. Current Issues With Low-Latency Reads
11.10.2. Leveraging local data
11.10.3. Performance Comparisons of HBase vs. HDFS
11.11. Amazon EC2
11.12. Case Studies

11.1. Operating System

11.1.1. Memory

RAM, RAM, RAM. Don't starve HBase.

11.1.2. 64-bit

Use a 64-bit platform (and 64-bit JVM).

11.1.3. Swapping

Watch out for swapping. Set swappiness to 0.

comments powered by Disqus