Log Message: |
[jira] [HBASE-4218] HFile data block encoding framework and delta encoding
implementation (Jacek Midgal, Mikhail Bautin)
Summary:
Adding a framework that allows to "encode" keys in an HFile data block. We
support two modes of encoding: (1) both on disk and in cache, and (2) in cache
only. This is distinct from compression that is already being done in HBase,
e.g. GZ or LZO. When data block encoding is enabled, we store blocks in cache
in an uncompressed but encoded form. This allows to fit more blocks in cache
and reduce the number of disk reads.
The most common example of data block encoding is delta encoding, where we take
advantage of the fact that HFile keys are sorted and share a lot of common
prefixes, and only store the delta between each pair of consecutive keys.
Initial encoding algorithms implemented are DIFF, FAST_DIFF, and PREFIX.
This is based on the delta encoding patch developed by Jacek Midgal during his
2011 summer internship at Facebook. The original patch is available here:
https://reviews.apache.org/r/2308/diff/.
Test Plan: Unit tests. Distributed load test on a five-node cluster.
Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan
Reviewed By: Kannan
CC: tedyu, todd, mbautin, stack, Kannan, mcorgan, gqchen
Differential Revision: https://reviews.facebook.net/D447
|