Apache Jackrabbit : Overlay Blob Store

Overlay Blob Store

NOTE: The current status of this component is a proposed feature.

Overview

The overlay blob store is a multi-source blob store - a logical blob store consisting of at least two delegate blob stores. In the case of the overlay blob store, the union of all the data in all the delegate blob stores is presented to a user of the overlay blob store as a single logical "view" of the data being stored.

Example:

Delegate Blob Store A

  • FileA
  • FileC
  • FileE

Delegate Blob Store B

  • FileB
  • FileE
  • FileF
  • FileG

Overlay Blob Store View

  • FileA
  • FileB
  • FileC
  • FileE
  • FileF
  • FileG

The delegates for the overlay blob store are specified in configuration. Part of the configuration must include an indication of the priority of each delegate blob store. Reads and writes will be attempted in the order specified by the priority.

Reads

The overlay blob store fulfills read requests by attempting to satisfy the read request from each delegate. The read is issued to each delegate in priority order. Once a read is successfully satisfied by a delegate, the result of the delegate read is returned as the result of the overlay blob store read and no subsequent reads are attempted for that request. It is possible for a blob to exist in more than one delegate.

Writes

The overlay blob store fulfills write requests by attempting to write to each delegate in priority order. Once a write is successfully satisfied by a delegate, the result of the delegate write is returned as the result of the overlay blob store write and no subsequent writes are attempted for that request.

Read-Only Delegates

The overlay blob store supports the notion of a read-only delegate blob store. One or more of the delegate blob stores can be configured in read-only mode, meaning that it can be used to satisfy read requests but not write requests. An example use case for this scenario is where two content repositories are used, one for a production environment and one for a staging environment. The staging repository can be configured with an overlay blob store that accesses to the production storage location in read-only mode, so tests can execute in staging using production data without modifying production data or the production store.

Reads issued to a read-only delegate would be processed as normal. Writes issued to a read-only delegate would fail, causing the overlay blob store to move on to the next delegate to attempt to fulfill the write request.

Note that configuring all delegates of an overlay blob store would make the blob store useless for storing blobs and thus should not be an allowed condition - at least one delegate blob store must not be a read-only delegate.

Curation

Curation is the process of evaluating the blobs in a blob store to determine if that blob store is still the correct location for blobs to reside. In the case of the overlay blob store, a reason to curate data may be to gradually move data from one level of storage to a more cost-effective level of storage in a different container or location.

Curation is not in the scope of the overlay blob store; however, it may be prudent to add common curators to the same package in Oak in future efforts.

Use Cases

Hierarchical Blob Store

The overlay blob store directly addresses JCR Binary Usecase UC14 to store data in one of a number of blob stores based on a hierarchy.

In the example below, blobs are initially stored in the FileDataStore and then once they are more than 30 days old are moved to !S3DataStore. They can be read from either location. Note that moving from one data store to the other fits under the category of curation, which is not in this scope.

+-------+
|       |  <30 Days Old  +---------------+
|       +----------------> FileDataStore |
|       |                +---------------+
|  Oak  |
|       |
|       |  >=30 Days Old  +-------------+
|       +-----------------> S3DataStore |
|       |                 +-------------+
+-------+

Staging Environment

The overlay blob store can be used to address a production/staging deployment use case, where one Oak repository is the production repository and another is the staging repository. The production repository accesses a single blob store. The staging repository uses an overlay blob store to access a staging blob store as well as the production blob store in read-only mode. Thus staging can serve blobs out of either blob store but can only modify blobs on the staging blob store.

+-----------------+        +-----------------+
| Production Env  |        |   Staging Env   |
| +-------------+ |        | +-------------+ |
| |     Oak     | |    +-----+     Oak     | |
| +------+------+ |    |   | +------+------+ |
|        |        |  Read- |        |        |
|        |        |  Only  |        |        |
| +------V------+ |    |   | +------V------+ |
| | S3DataStore <------+   | | S3DataStore | |
| +-------------+ |        | +-------------+ |
|                 |        |                 |
+-----------------+        +-----------------+

S3DataStore Clustering

The overlay blob store could be used to address JCR Binary Usecase UC9, where two Oak nodes in a cluster may both have a record of a blob in the node store but one node may temporarily not be able to access the blob in the case of async upload. This could be addressed by using an overlay blob store where the first level blob store would be FileDataStore on an NFS mount and the second level blob store would be !S3DataStore without a cache. The overlay blob store on each node will look for any asset in both the FileDataStore and the !S3DataStore, thus avoiding a split-brain scenario.

+-----------------------------+
| Node 1                      |
| +-----+                     |
| |     |                     |
| |     +-------------------------------+
| | Oak |                     |         |
| |     |   +---------------+ |         |
| |     +-->+ FileDataStore | |  +------V------+
| |     |   +-------^-------+ |  | S3DataStore |
| +-----+           |         |  +------+------+
+-------------------|---------+         |
                    |            +------V------+
                   NFS           |  S3 Bucket  | 
                    |            +------^------+
+-------------------|---------+         |
| Node 2            |         |  +------+------+
| +-----+           |         |  | S3DataStore |
| |     |   +-------V-------+ |  +------^------+
| |     +---> FileDataStore | |         |
| | Oak |   +---------------+ |         |
| |     |                     |         |
| |     +-------------------------------+
| |     |                     |
| +-----+                     |
+-----------------------------+