This module provides a mailbox implementation for persisting mailboxes (messages, and subscriptions) in a HBase cluster.
It only supports the Basic capability.
This should provide an overview of the design and implementation of Mailbox HBase.
The current implementations stores Messages, Mailboxes and Subscriptions in their own tables.
There are:Mailboxes are identified using a unique UUID
The IMAP RFC states that mailboxes should keep message UIDs unique and in ascending order. Mailbox HBase uses incrementColumnValue int the HBaseUidProvider implementation to achieve this.
Message bodies (more importantly big attachements) sent to many users are stored many times. There is no space sharing yet.
Message data and message meta-data (flags and properties) are stored in different column families so the column family optimization options can apply. Keep in mind that message data does not change, while meta-data does change.
In order for the mailbox implementation to work you have to provide it with a link to your HBase cluster. Putting hbase-site.xml on the class path should be enough. Mailbox HBase will pick it up an read all the configuration parameters from it.
This is a overview of the most important classes in the implementation.
HBaseMailboxManager extends the StoreMailboxManager class. It has a simple implementation that just overrides the doCreateMailbox method to return a HBaseMailbox implementation and createMessageManger method to return a HBaseMessageManager implementation. Other then that it relies on the default StoreMailboxManager implementation.
HBaseMessageManager extends StoreMailboxManager and provides an implementation for getPermanentFlags method.
Message bodies can have varying sizes. Some have attachements of up to 25Mb, some even greater. There are practical limits to the size of a HBase column (see http://hbase.apache.org/book.html#supported.datatypes). To address this issue, the implementation splits the message into smaller chunks and saves each chunk into a separate column. The columns have increasing integer names starting with 1 and there can be at most Long.MAX_VALUE chunks.
The magic happens in
ChunkInputStream and
ChunkOutputStream that extend
InputStream and OutputStream from java.io package.
Data is retrieved using HBase Get operation and stored into an internal byte array.
Data is stored using HBase Put operation and chunks are split into
chunkSize configurable sized chunks.
Things could be more efficient if HBase had streaming support.