Apache Accumulo Configuration Management

All accumulo properties have a default value in the source code. Properties can also be set in accumulo-site.xml and in zookeeper on per-table or system-wide basis. If properties are set in more than one location, accumulo will choose the property with the highest precedence. This order of precedence is described below (from highest to lowest):

LocationDescription
Zookeeper
table properties
Table properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. While table properties take precedent over system properties, both will override properties set in accumulo-site.xml

Table properties consist of all properties with the table.* prefix. Table properties are configured on a per-table basis using the following shell commmand:
config -t TABLE -s PROPERTY=VALUE
Zookeeper
system properties
System properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. System properties consist of all properties with a 'yes' in the 'Zookeeper Mutable' column in the table below. They are set with the following shell command:
config -s PROPERTY=VALUE
If a table.* property is set using this method, the value will apply to all tables except those configured on per-table basis (which have higher precedence).

While most system properties take effect immediately, some require a restart of the process which is indicated in 'Zookeeper Mutable'.
accumulo-site.xml Accumulo processes (master, tserver, etc) read their local accumulo-site.xml on start up. Therefore, changes made to accumulo-site.xml must rsynced across the cluster and processes must be restarted to apply changes.

Certain properties (indicated by a 'no' in 'Zookeeper Mutable') cannot be set in zookeeper and only set in this file. The accumulo-site.xml also allows you to configure tablet servers with different settings.
Default All properties have a default value in the source code. This value has the lowest precedence and is overriden if set in accumulo-site.xml or zookeeper.

While the default value is usually optimal, there are cases where a change can increase query and ingest performance.

The 'config' command in the shell allows you to view the current system configuration. You can also use the '-t' option to view a table's configuration as below:

    $ ./bin/accumulo shell -u root
    Enter current password for 'root'@'ac14': ******

    Shell - Apache Accumulo Interactive Shell
    - 
    - version: 1.4.2
    - instance name: ac14
    - instance id: 4f48fa03-f692-43ce-ae03-94c9ea8b7181
    - 
    - type 'help' for a list of available commands
    - 
    root@ac13> config -t foo
    ---------+---------------------------------------------+------------------------------------------------------
    SCOPE    | NAME                                        | VALUE
    ---------+---------------------------------------------+------------------------------------------------------
    default  | table.balancer ............................ | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
    default  | table.bloom.enabled ....................... | false
    default  | table.bloom.error.rate .................... | 0.5%
    default  | table.bloom.hash.type ..................... | murmur
    default  | table.bloom.key.functor ................... | org.apache.accumulo.core.file.keyfunctor.RowFunctor
    default  | table.bloom.load.threshold ................ | 1
    default  | table.bloom.size .......................... | 1048576
    default  | table.cache.block.enable .................. | false
    default  | table.cache.index.enable .................. | false
    default  | table.compaction.major.everything.at ...... | 19700101000000GMT
    default  | table.compaction.major.everything.idle .... | 1h
    default  | table.compaction.major.ratio .............. | 1.3
    site     |    @override .............................. | 1.4
    system   |    @override .............................. | 1.5
    table    |    @override .............................. | 1.6
    default  | table.compaction.minor.idle ............... | 5m
    default  | table.compaction.minor.logs.threshold ..... | 3
    default  | table.failures.ignore ..................... | false
  

Configuration Properties

Jump to: instance.* | general.* | master.* | tserver.* | logger.* | gc.* | monitor.* | trace.* | table.* | table.constraint.* | table.iterator.* | table.group.*

instance.*
Properties in this category must be consistent throughout a cloud. This is enforced and servers won't be able to communicate if these differ.
PropertyTypeZookeeper MutableDefault ValueDescription
instance.dfs.dir absolute path no
/accumulo
HDFS directory in which accumulo instance will run. Do not change after accumulo is initialized.
instance.dfs.uri uri no
 
The url accumulo should use to connect to DFS. If this is empty, accumulo will obtain this information from the hadoop configuration.
instance.secret string no
DEFAULT
A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo accumulo.server.util.ChangeSecret [oldpasswd] [newpasswd], and then update conf/accumulo-site.xml everywhere.
instance.zookeeper.host host list no
localhost:2181
Comma separated list of zookeeper servers
instance.zookeeper.timeout duration no
30s
Zookeeper session timeout; max value when represented as milliseconds should be no larger than 2147483647
general.*
Properties in this category affect the behavior of accumulo overall, but do not have to be consistent throughout a cloud.
PropertyTypeZookeeper MutableDefault ValueDescription
general.classpaths string no
$ACCUMULO_HOME/conf,
$ACCUMULO_HOME/lib/[^.].$ACCUMULO_VERSION.jar,
$ACCUMULO_HOME/lib/[^.].*.jar,
$ZOOKEEPER_HOME/zookeeper[^.].*.jar,
$HADOOP_HOME/[^.].*.jar,
$HADOOP_HOME/conf,
$HADOOP_HOME/lib/[^.].*.jar,
A list of all of the places to look for a class. Order does matter, as it will look for the jar starting in the first location to the last. Please note, hadoop conf and hadoop lib directories NEED to be here, along with accumulo lib and zookeeper directory. Supports full regex on filename alone.
general.dynamic.classpaths string no
$ACCUMULO_HOME/lib/ext/[^.].*.jar
A list of all of the places where changes in jars or classes will force a reload of the classloader.
general.kerberos.keytab path no
 
Path to the kerberos keytab to use. Leave blank if not using kerberoized hdfs
general.kerberos.principal string no
 
Name of the kerberos principal to use. _HOST will automatically be replaced by the machines hostname in the hostname portion of the principal. Leave blank if not using kerberoized hdfs
general.rpc.timeout duration no
120s
Time to wait on I/O for simple, short RPC calls
master.*
Properties in this category affect the behavior of the master server
PropertyTypeZookeeper MutableDefault ValueDescription
master.bulk.retries count yes
3
The number of attempts to bulk-load a file before giving up.
master.bulk.threadpool.size count yes
5
The number of threads to use when coordinating a bulk-import.
master.logger.balancer java class yes
org.apache.accumulo.server.master.balancer.SimpleLoggerBalancer
The balancer class that accumulo will use to make logger assignment decisions.
master.port.client port yes but requires restart of the master
9999
The port used for handling client connections on the master
master.recovery.max.age duration yes
60m
Recovery files older than this age will be removed.
master.recovery.pool string yes
recovery
Priority queue to use for log recovery map/reduce jobs.
master.recovery.queue string yes
default
Priority queue to use for log recovery map/reduce jobs.
master.recovery.reducers count yes
10
Number of reducers to use to sort recovery logs (per log)
master.recovery.sort.mapreduce boolean yes
false
If true, use map/reduce to sort write-ahead logs during recovery
master.recovery.time.max duration yes
30m
The maximum time to attempt recovery before giving up
master.server.threadcheck.time duration yes
1s
The time between adjustments of the server thread pool.
master.server.threads.minimum count yes
2
The minimum number of threads to use to handle incoming requests.
master.tablet.balancer java class yes
org.apache.accumulo.server.master.balancer.TableLoadBalancer
The balancer class that accumulo will use to make tablet assignment and migration decisions.
tserver.*
Properties in this category affect the behavior of the tablet servers
PropertyTypeZookeeper MutableDefault ValueDescription
tserver.bloom.load.concurrent.max count yes
4
The number of concurrent threads that will load bloom filters in the background. Setting this to zero will make bloom filters load in the foreground.
tserver.bulk.assign.threads count yes
1
The master delegates bulk file processing and assignment to tablet servers. After the bulk file has been processed, the tablet server will assign the file to the appropriate tablets on all servers. This property controls the number of threads used to communicate to the other servers.
tserver.bulk.process.threads count yes
1
The master will task a tablet server with pre-processing a bulk file prior to assigning it to the appropriate tablet servers. This configuration value controls the number of threads used to process the files.
tserver.bulk.retry.max count yes
3
The number of times the tablet server will attempt to assign a file to a tablet as it migrates and splits.
tserver.cache.data.size memory yes
100M
Specifies the size of the cache for file data blocks.
tserver.cache.index.size memory yes
512M
Specifies the size of the cache for file indices.
tserver.client.timeout duration yes
3s
Time to wait for clients to continue scans before closing a session.
tserver.compaction.major.concurrent.max count yes but requires restart of the tserver
3
The maximum number of concurrent major compactions for a tablet server
tserver.compaction.major.delay duration yes
30s
Time a tablet server will sleep between checking which tablets need compaction.
tserver.compaction.major.thread.files.open.max count yes but requires restart of the tserver
10
Max number of files a major compaction thread can open at once.
tserver.compaction.minor.concurrent.max count yes
4
The maximum number of concurrent minor compactions for a tablet server
tserver.default.blocksize memory yes
1M
Specifies a default blocksize for the tserver caches
tserver.dir.memdump path yes
/tmp
A long running scan could possibly hold memory that has been minor compacted. To prevent this, the in memory map is dumped to a local file and the scan is switched to that local file. We can not switch to the minor compacted file because it may have been modified by iterators. The file dumped to the local dir is an exact copy of what was in memory.
tserver.files.open.idle duration yes
1m
Tablet servers leave previously used map files open for future queries. This setting determines how much time an unused map file should be kept open until it is closed.
tserver.hold.time.max duration yes
5m
The maximum time for a tablet server to be in the "memory full" state. If the tablet server cannot write out memory in this much time, it will assume there is some failure local to its node, and quit. A value of zero is equivalent to forever.
tserver.logger.count count yes but requires restart of the tserver
2
The number of loggers that each tablet server should use.
tserver.logger.strategy string yes
org.apache.accumulo.server.tabletserver.log.RoundRobinLoggerStrategy
The classname used to decide which loggers to use.
tserver.logger.timeout duration yes
30s
The time to wait for a logger to respond to a write-ahead request
tserver.memory.lock boolean yes
false
The tablet server must communicate with zookeeper frequently to maintain its locks. If the tablet server's memory is swapped out the java garbage collector can stop all processing for long periods. Change this property to true and the tablet server will attempt to lock all of its memory to RAM, which may reduce delays during java garbage collection. You will have to modify the system limit for "max locked memory". This feature is only available when running on Linux. Alternatively you may also want to set /proc/sys/vm/swappiness to zero (again, this is Linux-specific).
tserver.memory.manager java class yes
org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager
An implementation of MemoryManger that accumulo will use.
tserver.memory.maps.max memory yes
1G
Maximum amount of memory that can be used to buffer data written to a tablet server. There are two other properties that can effectively limit memory usage table.compaction.minor.logs.threshold and tserver.walog.max.size. Ensure that table.compaction.minor.logs.threshold * tserver.walog.max.size >= this property.
tserver.memory.maps.native.enabled boolean yes but requires restart of the tserver
true
An in-memory data store for accumulo implemented in c++ that increases the amount of data accumulo can hold in memory and avoids Java GC pauses.
tserver.metadata.readahead.concurrent.max count yes
8
The maximum number of concurrent metadata read ahead that will execute.
tserver.migrations.concurrent.max count yes
1
The maximum number of concurrent tablet migrations for a tablet server
tserver.monitor.fs boolean yes
true
When enabled the tserver will monitor file systems and kill itself when one switches from rw to ro. This is usually and indication that Linux has detected a bad disk.
tserver.mutation.queue.max memory yes
256K
The amount of memory to use to store write-ahead-log mutations-per-session before flushing them.
tserver.port.client port yes but requires restart of the tserver
9997
The port used for handling client connections on the tablet servers
tserver.port.search boolean yes
false
if the ports above are in use, search higher ports until one is available
tserver.readahead.concurrent.max count yes
16
The maximum number of concurrent read ahead that will execute. This effectively limits the number of long running scans that can run concurrently per tserver.
tserver.scan.files.open.max count yes but requires restart of the tserver
100
Maximum total map files that all tablets in a tablet server can open for scans.
tserver.server.threadcheck.time duration yes
1s
The time between adjustments of the server thread pool.
tserver.server.threads.minimum count yes
2
The minimum number of threads to use to handle incoming requests.
tserver.session.idle.max duration yes
1m
maximum idle time for a session
tserver.tablet.split.midpoint.files.max count yes
30
To find a tablets split points, all index files are opened. This setting determines how many index files can be opened at once. When there are more index files than this setting multiple passes must be made, which is slower. However opening too many files at once can cause problems.
tserver.walog.max.size memory yes
1G
The maximum size for each write-ahead log. See comment for property tserver.memory.maps.max
logger.*
Properties in this category affect the behavior of the write-ahead logger servers
PropertyTypeZookeeper MutableDefault ValueDescription
logger.archive boolean yes
false
determines if logs are archived in hdfs
logger.archive.replication count yes
0
determines the replication factor for walogs archived in hdfs, set to zero to use default
logger.copy.threadpool.size count yes
2
size of the thread pool used to copy files from the local log area to HDFS
logger.dir.walog path yes
walogs
The directory used to store write-ahead logs on the local filesystem. It is possible to specify a comma-separated list of directories.
logger.monitor.fs boolean yes
true
When enabled the logger will monitor file systems and kill itself when one switches from rw to ro. This is usually and indication that Linux has detected a bad disk.
logger.port.client port yes but requires restart of the logger
11224
The port used for write-ahead logger services
logger.port.search boolean yes
false
if the port above is in use, search higher ports until one is available
logger.recovery.file.replication count yes
2
When a logger puts a WALOG into HDFS, it will use this as the replication factor.
logger.server.threadcheck.time duration yes
1s
The time between adjustments of the server thread pool.
logger.server.threads.minimum count yes
2
The miniumum number of threads to use to handle incoming requests.
logger.sort.buffer.size memory yes
200M
The amount of memory to use when sorting logs during recovery. Only used when *not* sorting logs with map/reduce.
gc.*
Properties in this category affect the behavior of the accumulo garbage collector.
PropertyTypeZookeeper MutableDefault ValueDescription
gc.cycle.delay duration yes
5m
Time between garbage collection cycles. In each cycle, old files no longer in use are removed from the filesystem.
gc.cycle.start duration yes
30s
Time to wait before attempting to garbage collect any old files.
gc.port.client port yes but requires restart of the gc
50091
The listening port for the garbage collector's monitor service
gc.threads.delete count yes
16
The number of threads used to delete files
monitor.*
Properties in this category affect the behavior of the monitor web server.
PropertyTypeZookeeper MutableDefault ValueDescription
monitor.banner.background string yes
#304065
The background color of the banner text displayed on the monitor page.
monitor.banner.color string yes
#c4c4c4
The color of the banner text displayed on the monitor page.
monitor.banner.text string yes
 
The banner text displayed on the monitor page.
monitor.port.client port no
50095
The listening port for the monitor's http service
monitor.port.log4j port no
4560
The listening port for the monitor's log4j logging collection.
trace.*
Properties in this category affect the behavior of distributed tracing.
PropertyTypeZookeeper MutableDefault ValueDescription
trace.password string no
secret
The password for the user used to store distributed traces
trace.port.client port no
12234
The listening port for the trace server
trace.table string no
trace
The name of the table to store distributed traces
trace.user string no
root
The name of the user to store distributed traces
table.*
Properties in this category affect tablet server treatment of tablets, but can be configured on a per-table basis. Setting these properties in the site file will override the default globally for all tables and not any specific table. However, both the default and the global setting can be overridden per table using the table operations API or in the shell, which sets the overridden value in zookeeper. Restarting accumulo tablet servers after setting these properties in the site file will cause the global setting to take effect. However, you must use the API or the shell to change properties in zookeeper that are set on a table.
PropertyTypeZookeeper MutableDefault ValueDescription
table.balancer string yes
org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
This property can be set to allow the LoadBalanceByTable load balancer to change the called Load Balancer for this table
table.bloom.enabled boolean yes
false
Use bloom filters on this table.
table.bloom.error.rate fraction/percentage yes
0.5%
Bloom filter error rate.
table.bloom.hash.type string yes
murmur
The bloom filter hash type
table.bloom.key.functor java class yes
org.apache.accumulo.core.file.keyfunctor.RowFunctor
A function that can transform the key prior to insertion and check of bloom filter. org.apache.accumulo.core.file.keyfunctor.RowFunctor,,org.apache.accumulo.core.file.keyfunctor.ColumnFamilyFunctor, and org.apache.accumulo.core.file.keyfunctor.ColumnQualifierFunctor are allowable values. One can extend any of the above mentioned classes to perform specialized parsing of the key.
table.bloom.load.threshold count yes
1
This number of seeks that would actually use a bloom filter must occur before a map files bloom filter is loaded. Set this to zero to initiate loading of bloom filters when a map file opened.
table.bloom.size count yes
1048576
Bloom filter size, as number of keys.
table.cache.block.enable boolean yes
false
Determines whether file block cache is enabled.
table.cache.index.enable boolean yes
true
Determines whether index cache is enabled.
table.compaction.major.everything.idle duration yes
1h
After a tablet has been idle (no mutations) for this time period it may have all of its map file compacted into one. There is no guarantee an idle tablet will be compacted. Compactions of idle tablets are only started when regular compactions are not running. Idle compactions only take place for tablets that have one or more map files.
table.compaction.major.ratio fraction/percentage yes
3
minimum ratio of total input size to maximum input file size for running a major compaction. When adjusting this property you may want to also adjust table.file.max. Want to avoid the situation where only merging minor compactions occur.
table.compaction.minor.idle duration yes
5m
After a tablet has been idle (no mutations) for this time period it may have its in-memory map flushed to disk in a minor compaction. There is no guarantee an idle tablet will be compacted.
table.compaction.minor.logs.threshold count yes
3
When there are more than this many write-ahead logs against a tablet, it will be minor compacted. See comment for property tserver.memory.maps.max
table.failures.ignore boolean yes
false
If you want queries for your table to hang or fail when data is missing from the system, then set this to false. When this set to true missing data will be reported but queries will still run possibly returning a subset of the data.
table.file.blocksize memory yes
0B
Overrides the hadoop dfs.block.size setting so that map files have better query performance. The maximum value for this is 2147483647
table.file.compress.blocksize memory yes
100K
Overrides the hadoop io.seqfile.compress.blocksize setting so that map files have better query performance. The maximum value for this is 2147483647
table.file.compress.blocksize.index memory yes
128K
Determines how large index blocks can be in files that support multilevel indexes. The maximum value for this is 2147483647
table.file.compress.type string yes
gz
One of gz,lzo,none
table.file.max count yes
15
Determines the max # of files each tablet in a table can have. When adjusting this property you may want to consider adjusting table.compaction.major.ratio also. Setting this property to 0 will make it default to tserver.scan.files.open.max-1, this will prevent a tablet from having more files than can be opened. Setting this property low may throttle ingest and increase query performance.
table.file.replication count yes
0
Determines how many replicas to keep of a tables map files in HDFS. When this value is LTE 0, HDFS defaults are used.
table.file.type string yes
rf
Change the type of file a table writes
table.formatter string yes
org.apache.accumulo.core.util.format.DefaultFormatter
The Formatter class to apply on results in the shell
table.groups.enabled string yes
 
A comma separated list of locality group names to enable for this table.
table.scan.max.memory memory yes
1M
The maximum amount of memory that will be used to cache results of a client query/scan. Once this limit is reached, the buffered data is sent to the client.
table.security.scan.visibility.default string yes
 
The security label that will be assumed at scan time if an entry does not have a visibility set.
Note: An empty security label is displayed as []. The scan results will show an empty visibility even if the visibility from this setting is applied to the entry.
CAUTION: If a particular key has an empty security label AND its table's default visibility is also empty, access will ALWAYS be granted for users with permission to that table. Additionally, if this field is changed, all existing data with an empty visibility label will be interpreted with the new label on the next scan.
table.split.threshold memory yes
1G
When combined size of files exceeds this amount a tablet is split.
table.walog.enabled boolean yes
true
Use the write-ahead log to prevent the loss of data.
table.constraint.*
Properties in this category are per-table properties that add constraints to a table. These properties start with the category prefix, followed by a number, and their values correspond to a fully qualified Java class that implements the Constraint interface.
For example, table.constraint.1 = org.apache.accumulo.core.constraints.MyCustomConstraint and table.constraint.2 = my.package.constraints.MySecondConstraint
table.iterator.*
Properties in this category specify iterators that are applied at various stages (scopes) of interaction with a table. These properties start with the category prefix, followed by a scope (minc, majc, scan, etc.), followed by a period, followed by a name, as in table.iterator.scan.vers, or table.iterator.scan.custom. The values for these properties are a number indicating the ordering in which it is applied, and a class name such as table.iterator.scan.vers = 10,org.apache.accumulo.core.iterators.VersioningIterator
These iterators can take options if additional properties are set that look like this property, but are suffixed with a period, followed by 'opt' followed by another period, and a property name.
For example, table.iterator.minc.vers.opt.maxVersions = 3
table.group.*
Properties in this category are per-table properties that define locality groups in a table. These properties start with the category prefix, followed by a name, followed by a period, and followed by a property for that group.
For example table.group.group1=x,y,z sets the column families for a group called group1. Once configured, group1 can be enabled by adding it to the list of groups in the table.groups.enabled property.
Additional group options may be specified for a named group by setting table.group.<name>.opt.<key>=<value>.

Property Type Descriptions

Property TypeDescription

duration

A non-negative integer optionally followed by a unit of time (whitespace disallowed), as in 30s.
If no unit of time is specified, seconds are assumed. Valid units are 'ms', 's', 'm', 'h' for milliseconds, seconds, minutes, and hours.
Examples of valid durations are '600', '30s', '45m', '30000ms', '3d', and '1h'.
Examples of invalid durations are '1w', '1h30m', '1s 200ms', 'ms', '', and 'a'.
Unless otherwise stated, the max value for the duration represented in milliseconds is 9223372036854775807

date/time

A date/time string in the format: YYYYMMDDhhmmssTTT where TTT is the 3 character time zone

memory

A positive integer optionally followed by a unit of memory (whitespace disallowed), as in 2G.
If no unit is specified, bytes are assumed. Valid units are 'B', 'K', 'M', 'G', for bytes, kilobytes, megabytes, and gigabytes.
Examples of valid memories are '1024', '20B', '100K', '1500M', '2G'.
Examples of invalid memories are '1M500K', '1M 2K', '1MB', '1.5G', '1,024K', '', and 'a'.
Unless otherwise stated, the max value for the memory represented in bytes is 9223372036854775807

host list

A comma-separated list of hostnames or ip addresses, with optional port numbers.
Examples of valid host lists are 'localhost:2000,www.example.com,10.10.1.1:500' and 'localhost'.
Examples of invalid host lists are '', ':1000', and 'localhost:80000'

port

An positive integer in the range 1024-65535, not already in use or specified elsewhere in the configuration

count

A non-negative integer in the range of 0-2147483647

fraction/percentage

A floating point number that represents either a fraction or, if suffixed with the '%' character, a percentage.
Examples of valid fractions/percentages are '10', '1000%', '0.05', '5%', '0.2%', '0.0005'.
Examples of invalid fractions/percentages are '', '10 percent', 'Hulk Hogan'

path

A string that represents a filesystem path, which can be either relative or absolute to some directory. The filesystem depends on the property.

absolute path

An absolute filesystem path. The filesystem depends on the property. This is the same as path, but enforces that its root is explicitly specified.

java class

A fully qualified java class name representing a class on the classpath.
An example is 'java.lang.String', rather than 'String'

string

An arbitrary string of characters whose format is unspecified and interpreted based on the context of the property to which it applies.

boolean

Has a value of either 'true' or 'false'

uri

A valid URI