Overview

The sys.options table in Drill contains information about boot and system options listed in the following tables. To tune performance, you adjust some of the options to suit your application. Configure the options using the ALTER SESSION or ALTER SYSTEM command.

Boot Options

Name Default Comments
drill.exec.buffer.impl "org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer"
drill.exec.buffer.size 6 Available memory in terms of record batches to hold data downstream of an operation.Increase this value to increase query speed.
drill.exec.compile.debug TRUE
drill.exec.http.enabled TRUE
drill.exec.operator.packages "org.apache.drill.exec.physical.config"
drill.exec.sort.external.batch.size 4000
drill.exec.sort.external.spill.directories "/tmp/drill/spill" Determines which directory to use for spooling
drill.exec.sort.external.spill.group.size 100
drill.exec.storage.file.text.batch.size 4000
drill.exec.storage.packages "org.apache.drill.exec.store" "org.apache.drill.exec.store.mock" # This file tells Drill to consider this module when class path scanning. # This file can also include any supplementary configuration information. # This file is in HOCON format see https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
drill.exec.sys.store.provider.class "org.apache.drill.exec.store.sys.zk.ZkPStoreProvider" The Pstore (Persistent Configuration Storage) provider to use. The Pstore holds configuration and profile data.
drill.exec.zk.connect "localhost:2181" The ZooKeeper quorum that Drill uses to connect to data sources. Configure on each Drillbit node.
drill.exec.zk.refresh 500
file.separator "/"
java.specification.version 1.7
java.vm.name "Java HotSpot(TM) 64-Bit Server VM"
java.vm.specification.version 1.7
log.path "/log/sqlline.log"
sun.boot.library.path /Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/jre/lib
sun.java.command "sqlline.SqlLine -d org.apache.drill.jdbc.Driver --maxWidth=10000 -u jdbc:drill:zk=local"
sun.os.patch.level unknown
user ""

System Options

Name Default Comments
drill.exec.functions.cast_empty_string_to_null FALSE
drill.exec.storage.file.partition.column.label dir Accepts a string input.
exec.errors.verbose FALSE Toggles verbose output of executable error messages
exec.java_compiler DEFAULT Switches between DEFAULT, JDK, and JANINO mode for the current session. Uses Janino by default for generated source code of less than exec.java_compiler_janino_maxsize; otherwise, switches to the JDK compiler.
exec.java_compiler_debug TRUE Toggles the output of debug-level compiler error messages in runtime generated code.
exec.java_compiler_janino_maxsize 262144 See the exec.java_compiler option comment. Accepts inputs of type LONG.
exec.max_hash_table_size 1073741824 Ending size for hash tables. Range: 0 - 1073741824
exec.min_hash_table_size 65536 Starting size for hash tables. Increase according to available memory to improve performance. Range: 0 - 1073741824
exec.queue.enable FALSE
exec.queue.large 10 Range: 0-1000
exec.queue.small 100 Range: 0-1001
exec.queue.threshold 30000000 Range: 0-9223372036854775807
exec.queue.timeout_millis 300000 Range: 0-9223372036854775807
planner.add_producer_consumer FALSE Increase prefetching of data from disk. Disable for in-memory reads.
planner.affinity_factor 1.2 Accepts inputs of type DOUBLE.
planner.broadcast_factor 1
planner.broadcast_threshold 10000000 Threshold in number of rows that triggers a broadcast join for a query if the right side of the join contains fewer rows than the threshold. Avoids broadcasting too many rows to join. Range: 0-2147483647
planner.disable_exchanges FALSE Toggles the state of hashing to a random exchange.
planner.enable_broadcast_join TRUE Changes the state of aggregation and join operators. Do not disable.
planner.enable_demux_exchange FALSE Toggles the state of hashing to a demulitplexed exchange.
planner.enable_hash_single_key TRUE
planner.enable_hashagg TRUE Enable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write to disk. Enable is recommended.
planner.enable_hashjoin TRUE Enable the memory hungry hash join. Does not write to disk.
planner.enable_hashjoin_swap
planner.enable_mergejoin TRUE Sort-based operation. Writes to disk.
planner.enable_multiphase_agg TRUE
planner.enable_mux_exchange TRUE Toggles the state of hashing to a multiplexed exchange.
planner.enable_streamagg TRUE Sort-based operation. Writes to disk.
planner.identifier_max_length 1024
planner.join.hash_join_swap_margin_factor 10
planner.join.row_count_estimate_factor 1
planner.memory.average_field_width 8
planner.memory.enable_memory_estimation FALSE
planner.memory.hash_agg_table_factor 1.1
planner.memory.hash_join_table_factor 1.1
planner.memory.max_query_memory_per_node 2147483648
planner.memory.non_blocking_operators_memory 64 Range: 0-2048
planner.partitioner_sender_max_threads 8
planner.partitioner_sender_set_threads -1
planner.partitioner_sender_threads_factor 1
planner.producer_consumer_queue_size 10 How much data to prefetch from disk (in record batches) out of band of query execution
planner.slice_target 100000 The number of records manipulated within a fragment before Drill parallelizes operations.
planner.width.max_per_node 3 The maximum degree of distribution of a query across cores and cluster nodes.
planner.width.max_per_query 1000 Same as planner but applies to the query as executed by the entire cluster.
store.format parquet Output format for data written to tables with the CREATE TABLE AS (CTAS) command. Allowed values are parquet, json, or text. Allowed values: 0, -1, 1000000
store.json.all_text_mode FALSE Drill reads all data from the JSON files as VARCHAR. Prevents schema change errors.
store.mongo.all_text_mode FALSE Similar to store.json.all_text_mode for MongoDB.
store.parquet.block-size 536870912 Sets the size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system.
store.parquet.compression snappy Compression type for storing Parquet output. Allowed values: snappy, gzip, none
store.parquet.enable_dictionary_encoding FALSE
store.parquet.use_new_reader FALSE
window.enable FALSE

Memory Allocation

You can configure the amount of direct memory allocated to a Drillbit for query processing. The default limit is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct memory that a Drillbit allocates to query operations cannot exceed the limit set.

Drill mainly uses Java direct memory and performs well when executing operations in memory instead of storing the operations on disk. Drill does not write to disk unless absolutely necessary, unlike MapReduce where everything is written to disk during each phase of a job.

The JVM’s heap memory does not limit the amount of direct memory available in a Drillbit. The on-heap memory for Drill is only about 4-8G, which should suffice because Drill avoids having data sit in heap memory.

Modifying Drillbit Memory

You can modify memory for each Drillbit node in your cluster. To modify the memory for a Drillbit, edit the XX:MaxDirectMemorySize parameter in the Drillbit startup script located in <drill_installation_directory>/conf/drill- env.sh.

Note: If this parameter is not set, the limit depends on the amount of available system memory.

After you edit <drill_installation_directory>/conf/drill-env.sh, restart the Drillbit on the node.