Although at a conceptual level tables may be viewed as a sparse set of rows.
Physically they are stored on a per-column family basis. New columns
(i.e., columnfamily:column
) can be added to any
column family without pre-announcing them.
Table 5.2. ColumnFamily anchor
Row Key | Time Stamp | Column Family anchor |
---|---|---|
"com.cnn.www" | t9 | anchor:cnnsi.com = "CNN" |
"com.cnn.www" | t8 | anchor:my.look.ca = "CNN.com" |
Table 5.3. ColumnFamily contents
Row Key | Time Stamp | ColumnFamily "contents:" |
---|---|---|
"com.cnn.www" | t6 | contents:html = "<html>..." |
"com.cnn.www" | t5 | contents:html = "<html>..." |
"com.cnn.www" | t3 | contents:html = "<html>..." |
It is important to note in the diagram above that the empty cells shown in the
conceptual view are not stored since they need not be in a column-oriented
storage format. Thus a request for the value of the contents:html
column at time stamp t8
would return no value. Similarly, a
request for an anchor:my.look.ca
value at time stamp
t9
would return no value. However, if no timestamp is
supplied, the most recent value for a particular column would be returned
and would also be the first one found since timestamps are stored in
descending order. Thus a request for the values of all columns in the row
com.cnn.www
if no timestamp is specified would be:
the value of contents:html
from time stamp
t6
, the value of anchor:cnnsi.com
from time stamp t9
, the value of
anchor:my.look.ca
from time stamp t8
.
For more information about the internals of how Apache HBase stores data, see Section 9.7, “Regions”.