Apache James Mailbox - Mailbox HBase

Mailbox HBase Responsibility

This module provides a mailbox implementation for persisting mailboxes (messages, and subscriptions) in a HBase cluster.

It only supports the Basic capability.

Overview

This should provide an overview of the design and implementation of Mailbox HBase.

Tables

The current implementations stores Messages, Mailboxes and Subscriptions in their own tables.

There are:

JAMES_MAILBOXES - for storing mailboxes.
JAMES_MESSAGES - for storing messages.
JAMES_SUBSCRIPTIONS - for storing user subscriptions.

Mailbox UID generation

Mailboxes are identified using a unique UUID

Message UID generation

The IMAP RFC states that mailboxes should keep message UIDs unique and in ascending order. Mailbox HBase uses incrementColumnValue int the HBaseUidProvider implementation to achieve this.

HBase row keys

HBase uses keys to access values. The current design uses the following row key structure:

JAMES_MAILBOXES: row key is mailbox UUID
JAMES_MESSAGES: row key is compound by concatenating mailbox UID and message UID (in reverseorder). This way we have messages groupd by mailbox and in descending order (most recent first).
JAMES_SUBSCRIPTION: row key is user name.

Misc

Message bodies (more importantly big attachements) sent to many users are stored many times. There is no space sharing yet.

Message data and message meta-data (flags and properties) are stored in different column families so the column family optimization options can apply. Keep in mind that message data does not change, while meta-data does change.

Installation

In order for the mailbox implementation to work you have to provide it with a link to your HBase cluster. Putting hbase-site.xml on the class path should be enough. Mailbox HBase will pick it up an read all the configuration parameters from it.

Mailbox HBase Classes

This is a overview of the most important classes in the implementation.

HBaseMailboxManager

HBaseMailboxManager extends the StoreMailboxManager class. It has a simple implementation that just overrides the doCreateMailbox method to return a HBaseMailbox implementation and createMessageManger method to return a HBaseMessageManager implementation. Other then that it relies on the default StoreMailboxManager implementation.

HBaseMessageManager

HBaseMessageManager extends StoreMailboxManager and provides an implementation for getPermanentFlags method.

Chunked Streams

Message bodies can have varying sizes. Some have attachements of up to 25Mb, some even greater. There are practical limits to the size of a HBase column (see http://hbase.apache.org/book.html#supported.datatypes). To address this issue, the implementation splits the message into smaller chunks and saves each chunk into a separate column. The columns have increasing integer names starting with 1 and there can be at most Long.MAX_VALUE chunks.

The magic happens in ChunkInputStream and ChunkOutputStream that extend InputStream and OutputStream from java.io package.
Data is retrieved using HBase Get operation and stored into an internal byte array. Data is stored using HBase Put operation and chunks are split into chunkSize configurable sized chunks. Things could be more efficient if HBase had streaming support.

HBaseMessage

Extends AbstractMessage and represents a message in the message store. What is important to remember is that the current implementation retrieves just the message meta-data from HBase and uses ChunkInputStream to load the message body only when needed.

Mailbox

Framework

Implementations

Wiring

References

About James

Download

Apache Software Foundation