Chapter 5. Data Model

Table of Contents

5.1. Conceptual View
5.2. Physical View
5.3. Table
5.4. Row
5.5. Column Family
5.6. Cells
5.7. Data Model Operations
5.7.1. Get
5.7.2. Put
5.7.3. Scans
5.7.4. Delete
5.8. Versions
5.8.1. Versions and HBase Operations
5.8.2. Current Limitations
5.9. Sort Order
5.10. Column Metadata
5.11. Joins
5.12. ACID

In short, applications store data into an HBase table. Tables are made of rows and columns. All columns in HBase belong to a particular column family. Table cells -- the intersection of row and column coordinates -- are versioned. A cell’s content is an uninterpreted array of bytes.

Table row keys are also byte arrays so almost anything can serve as a row key from strings to binary representations of longs or even serialized data structures. Rows in HBase tables are sorted by row key. The sort is byte-ordered. All table accesses are via the table row key -- its primary key.

5.1. Conceptual View

The following example is a slightly modified form of the one on page 2 of the BigTable paper. There is a table called webtable that contains two column families named contents and anchor. In this example, anchor contains two columns (anchor:cssnsi.com, anchor:my.look.ca) and contents contains one column (contents:html).

Column Names

By convention, a column name is made of its column family prefix and a qualifier. For example, the column contents:html is of the column family contents The colon character (:) delimits the column family from the column family qualifier.

Table 5.1. Table webtable

Row KeyTime StampColumnFamily contentsColumnFamily anchor
"com.cnn.www"t9 anchor:cnnsi.com = "CNN"
"com.cnn.www"t8 anchor:my.look.ca = "CNN.com"
"com.cnn.www"t6contents:html = "<html>..." 
"com.cnn.www"t5contents:html = "<html>..." 
"com.cnn.www"t3contents:html = "<html>..." 


comments powered by Disqus