Apache Accumulo Documentation : Locality Groups

Accumulo supports locality groups similar to those described in the Big Table paper. Locality groups allow vertical partitioning of data by column family. This allows user to configure their tables such that scans over a subset of column families are much faster. The Accumulo locality group model has the following features.

When the locality group configuration for a table is changed it has no effect on existing data. All minor and major compactions that occur after the change will organize data into the new locality group structure. As data is written into a table, it will cause minor and major compactions to occur. Over time this will result in all data being organized according to the new locality groups. If all data must be reorganized into the new locality groups immediately, this can be accomplished by forcing a full major compaction of the table. Use the compact command in the shell to accomplish this.

There are two ways to manipulate locality groups, via the shell or through the Java API. From the shell use the getgroups and setgroups commands. Through the API, TableOperations has the methods setLocalityGroups() and getLocalityGroups().

To limit scans to a set of locality groups, use the fetchColumnFamily() function on Scanner or BatchScanner. From the shell use scan with the -c option.