The Apache HBase™ Reference Guide

Question

When should I use HBase?

Answer 1

See the Section 9.1, “Overview” in the Architecture chapter.

Answer 2

See the FAQ that is up on the wiki, HBase Wiki FAQ.

Answer 3

Not really. SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the Chapter 5, Data Model section for examples on the HBase client.

Answer 4

See the link to the BigTable paper in Appendix F, Other Information About HBase in the appendix, as well as the other papers.

Answer 5

See Appendix G, HBase History.

Answer 6

See Section 9.7, “Regions”.

Answer 7

See Section 1.2, “Quick Start”.

Answer 8

See Chapter 2, Apache HBase (TM) Configuration.

Answer 9

See Chapter 5, Data Model and Chapter 6, HBase and Schema Design

Answer 10

See Section 6.5, “ Supported Datatypes ”.

Answer 11

See Section 6.9, “ Secondary Indexes and Alternate Query Paths ”

Answer 12

This is a very common quesiton. You can't. See Section 6.3.5, “Immutability of Rowkeys”.

Answer 13

See Chapter 5, Data Model, Section 9.3, “Client” and Section 10.1, “Non-Java Languages Talking to the JVM”.

Answer 14

See Chapter 7, HBase and MapReduce

Answer 15

See Chapter 11, Apache HBase (TM) Performance Tuning.

Answer 16

See Chapter 12, Troubleshooting and Debugging Apache HBase (TM).

Answer 17

EC2 issues are a special case. See Troubleshooting Section 12.12, “Amazon EC2” and Performance Section 11.11, “Amazon EC2” sections.

Answer 18

See Chapter 14, Apache HBase (TM) Operational Management

Answer 19

See Section 14.7, “HBase Backup”

Answer 20

See Appendix F, Other Information About HBase

	HBase-0.92.x	HBase-0.94.x	HBase-0.96
Hadoop-0.20.205	S	X	X
Hadoop-0.22.x	S	X	X
Hadoop-1.0.x	S	S	S
Hadoop-1.1.x	NT	S	S
Hadoop-0.23.x	X	S	NT
Hadoop-2.x	X	S	S

Row Key	Time Stamp	ColumnFamily `contents`	ColumnFamily `anchor`
"com.cnn.www"	t9		`anchor:cnnsi.com` = "CNN"
"com.cnn.www"	t8		`anchor:my.look.ca` = "CNN.com"
"com.cnn.www"	t6	`contents:html` = "<html>..."
"com.cnn.www"	t5	`contents:html` = "<html>..."
"com.cnn.www"	t3	`contents:html` = "<html>..."

Row Key	Time Stamp	Column Family `anchor`
"com.cnn.www"	t9	`anchor:cnnsi.com` = "CNN"
"com.cnn.www"	t8	`anchor:my.look.ca` = "CNN.com"

Row Key	Time Stamp	ColumnFamily "contents:"
"com.cnn.www"	t6	`contents:html` = "<html>..."
"com.cnn.www"	t5	`contents:html` = "<html>..."
"com.cnn.www"	t3	`contents:html` = "<html>..."

A.1. General
When should I use HBase? Are there other HBase FAQs? Does HBase support SQL? How can I find examples of NoSQL/HBase? What is the history of HBase?
	When should I use HBase?
	See the Section 9.1, “Overview” in the Architecture chapter.
	Are there other HBase FAQs?
	See the FAQ that is up on the wiki, HBase Wiki FAQ.
	Does HBase support SQL?
	Not really. SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the Chapter 5, Data Model section for examples on the HBase client.
	How can I find examples of NoSQL/HBase?
	See the link to the BigTable paper in Appendix F, Other Information About HBase in the appendix, as well as the other papers.
	What is the history of HBase?
	See Appendix G, HBase History.
A.2. Architecture
How does HBase handle Region-RegionServer assignment and locality?
	How does HBase handle Region-RegionServer assignment and locality?
	See Section 9.7, “Regions”.
A.3. Configuration
How can I get started with my first cluster? Where can I learn about the rest of the configuration options?
	How can I get started with my first cluster?
	See Section 1.2, “Quick Start”.
	Where can I learn about the rest of the configuration options?
	See Chapter 2, Apache HBase (TM) Configuration.
A.4. Schema Design / Data Access
How should I design my schema in HBase? How can I store (fill in the blank) in HBase? How can I handle secondary indexes in HBase? Can I change a table's rowkeys? What APIs does HBase support?
	How should I design my schema in HBase?
	See Chapter 5, Data Model and Chapter 6, HBase and Schema Design
	How can I store (fill in the blank) in HBase?
	See Section 6.5, “ Supported Datatypes ”.
	How can I handle secondary indexes in HBase?
	See Section 6.9, “ Secondary Indexes and Alternate Query Paths ”
	Can I change a table's rowkeys?
	This is a very common quesiton. You can't. See Section 6.3.5, “Immutability of Rowkeys”.
	What APIs does HBase support?
	See Chapter 5, Data Model, Section 9.3, “Client” and Section 10.1, “Non-Java Languages Talking to the JVM”.
A.5. MapReduce
How can I use MapReduce with HBase?
	How can I use MapReduce with HBase?
	See Chapter 7, HBase and MapReduce
A.6. Performance and Troubleshooting
How can I improve HBase cluster performance? How can I troubleshoot my HBase cluster?
	How can I improve HBase cluster performance?
	See Chapter 11, Apache HBase (TM) Performance Tuning.
	How can I troubleshoot my HBase cluster?
	See Chapter 12, Troubleshooting and Debugging Apache HBase (TM).
A.7. Amazon EC2
I am running HBase on Amazon EC2 and...
	I am running HBase on Amazon EC2 and...
	EC2 issues are a special case. See Troubleshooting Section 12.12, “Amazon EC2” and Performance Section 11.11, “Amazon EC2” sections.
A.8. Operations
How do I manage my HBase cluster? How do I back up my HBase cluster?
	How do I manage my HBase cluster?
	See Chapter 14, Apache HBase (TM) Operational Management
	How do I back up my HBase cluster?
	See Section 14.7, “HBase Backup”
A.9. HBase in Action
Where can I find interesting videos and presentations on HBase?
	Where can I find interesting videos and presentations on HBase?
	See Appendix F, Other Information About HBase

hfile.LASTKEY	The last key of the file (byte array)
hfile.AVG_KEY_LEN	The average key length in the file (int)
hfile.AVG_VALUE_LEN	The average value length in the file (int)

Version 1	Version 2
File info offset (long)
Data index offset (long)	loadOnOpenOffset (long) The offset of the section that we need toload when opening the file.
Number of data index entries (int)
metaIndexOffset (long) This field is not being used by the version 1 reader, so we removed it from version 2.	uncompressedDataIndexSize (long) The total uncompressed size of the whole data block index, including root-level, intermediate-level, and leaf-level blocks.
Number of meta index entries (int)
Total uncompressed bytes (long)
numEntries (int)	numEntries (long)
Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int)
	The number of levels in the data block index (int)
	firstDataBlockOffset (long) The offset of the first first data block. Used when scanning.
	lastDataBlockEnd (long) The offset of the first byte after the last key/value data block. We don't need to go beyond this offset when scanning.
Version: 1 (int)	Version: 2 (int)

The Apache HBase™ Reference Guide

Preface

Heads-up

Chapter 1. Getting Started

1.1. Introduction

1.2. Quick Start

Loopback IP

1.2.1. Download and unpack the latest stable release.

1.2.2. Start HBase

Is java installed?

1.2.3. Shell Exercises

1.2.4. Stopping HBase

1.2.5. Where to go next

Chapter 2. Apache HBase (TM) Configuration

2.1. Basic Prerequisites

2.1.1. Java

2.1.2. Operating System

2.1.2.1. ssh

2.1.2.2. DNS

2.1.2.3. Loopback IP

2.1.2.4. NTP

2.1.2.5. ulimit and nproc

2.1.2.5.1. ulimit on Ubuntu

2.1.2.6. Windows

2.1.3. Hadoop

2.1.3.1. Apache HBase 0.92 and 0.94

2.1.3.2. Apache HBase 0.96

2.1.3.3. Hadoop versions 0.20.x - 1.x

2.1.3.4. Apache HBase on Secure Hadoop

2.1.3.5. dfs.datanode.max.xcievers

2.2. HBase run modes: Standalone and Distributed

2.2.1. Standalone HBase

2.2.2. Distributed

2.2.2.1. Pseudo-distributed

Note

2.2.2.1.1. Pseudo-distributed Configuration File

2.2.2.1.2. Pseudo-distributed Extras

2.2.2.1.2.1. Startup

2.2.2.1.2.2. Stop

2.2.2.2. Fully-distributed

2.2.2.2.1. regionservers

2.2.2.2.2. ZooKeeper and HBase

2.2.2.2.3. HDFS Client Configuration

2.2.3. Running and Confirming Your Installation

2.3. Configuration Files

2.3.1. hbase-site.xml and hbase-default.xml

2.3.2. hbase-env.sh

2.3.3. log4j.properties

2.3.4. Client configuration and dependencies connecting to an HBase cluster

2.3.4.1. Java client configuration

2.4. Example Configurations

2.4.1. Basic Distributed HBase Install

2.4.1.1. hbase-site.xml

2.4.1.2. regionservers

2.4.1.3. hbase-env.sh

2.5. The Important Configurations

2.5.1. Required Configurations

2.5.1.1. Big Cluster Configurations

2.5.2. Recommended Configurations

2.5.2.1. ZooKeeper Configuration

2.5.2.1.1. zookeeper.session.timeout

2.5.2.1.2. Number of ZooKeeper Instances

2.5.2.2. HDFS Configurations

2.5.2.2.1. dfs.datanode.failed.volumes.tolerated

2.5.2.3. hbase.regionserver.handler.count

2.5.2.4. Configuration for large memory machines

2.5.2.5. Compression

2.5.2.6. Bigger Regions

2.5.2.6.1. How many regions per RegionServer?

2.5.2.7. Managed Splitting

2.5.2.8. Managed Compactions

2.5.2.9. Speculative Execution

2.5.3. Other Configurations

2.5.3.1. Balancer

2.5.3.2. Disabling Blockcache

2.5.3.3. Nagle's or the small package problem

Chapter 3. Upgrading

3.1. Upgrading from 0.94.x to 0.96.x

The Singularity

3.2. Upgrading from 0.92.x to 0.94.x

2.1.2.5. `ulimit` and `nproc`

2.1.2.5.1. `ulimit` on Ubuntu

2.1.3.5. `dfs.datanode.max.xcievers`

2.2.2.2.1. `regionservers`

2.3.1. `hbase-site.xml` and `hbase-default.xml`

2.3.2. `hbase-env.sh`

2.3.3. `log4j.properties`

2.4.1.1. `hbase-site.xml`

2.4.1.2. `regionservers`

2.4.1.3. `hbase-env.sh`

2.5.2.1.1. `zookeeper.session.timeout`

2.5.2.3. `hbase.regionserver.handler.count`

4.2.1. `irbrc`