Overview - Apache Drill

The sys.options table in Drill contains information about boot and system options listed in the following tables. To tune performance, you adjust some of the options to suit your application. Configure the options using the ALTER SESSION or ALTER SYSTEM command.

Boot Options

Name	Default	Comments
drill.exec.buffer.impl	"org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer"
drill.exec.buffer.size	6	Available memory in terms of record batches to hold data downstream of an operation.Increase this value to increase query speed.
drill.exec.compile.debug	TRUE
drill.exec.http.enabled	TRUE
drill.exec.operator.packages	"org.apache.drill.exec.physical.config"
drill.exec.sort.external.batch.size	4000
drill.exec.sort.external.spill.directories	"/tmp/drill/spill"	Determines which directory to use for spooling
drill.exec.sort.external.spill.group.size	100
drill.exec.storage.file.text.batch.size	4000
drill.exec.storage.packages	"org.apache.drill.exec.store" "org.apache.drill.exec.store.mock"	# This file tells Drill to consider this module when class path scanning. # This file can also include any supplementary configuration information. # This file is in HOCON format see https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
drill.exec.sys.store.provider.class	"org.apache.drill.exec.store.sys.zk.ZkPStoreProvider"	The Pstore (Persistent Configuration Storage) provider to use. The Pstore holds configuration and profile data.
drill.exec.zk.connect	"localhost:2181"	The ZooKeeper quorum that Drill uses to connect to data sources. Configure on each Drillbit node.
drill.exec.zk.refresh	500
file.separator	"/"
java.specification.version	1.7
java.vm.name	"Java HotSpot(TM) 64-Bit Server VM"
java.vm.specification.version	1.7
log.path	"/log/sqlline.log"
sun.boot.library.path	/Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/jre/lib
sun.java.command	"sqlline.SqlLine -d org.apache.drill.jdbc.Driver --maxWidth=10000 -u jdbc:drill:zk=local"
sun.os.patch.level	unknown
user	""

System Options

Name	Default	Comments
drill.exec.functions.cast_empty_string_to_null	FALSE
drill.exec.storage.file.partition.column.label	dir	Accepts a string input.
exec.errors.verbose	FALSE	Toggles verbose output of executable error messages
exec.java_compiler	DEFAULT	Switches between DEFAULT, JDK, and JANINO mode for the current session. Uses Janino by default for generated source code of less than exec.java_compiler_janino_maxsize; otherwise, switches to the JDK compiler.
exec.java_compiler_debug	TRUE	Toggles the output of debug-level compiler error messages in runtime generated code.
exec.java_compiler_janino_maxsize	262144	See the exec.java_compiler option comment. Accepts inputs of type LONG.
exec.max_hash_table_size	1073741824	Ending size for hash tables. Range: 0 - 1073741824
exec.min_hash_table_size	65536	Starting size for hash tables. Increase according to available memory to improve performance. Range: 0 - 1073741824
exec.queue.enable	FALSE
exec.queue.large	10	Range: 0-1000
exec.queue.small	100	Range: 0-1001
exec.queue.threshold	30000000	Range: 0-9223372036854775807
exec.queue.timeout_millis	300000	Range: 0-9223372036854775807
planner.add_producer_consumer	FALSE	Increase prefetching of data from disk. Disable for in-memory reads.
planner.affinity_factor	1.2	Accepts inputs of type DOUBLE.
planner.broadcast_factor	1
planner.broadcast_threshold	10000000	Threshold in number of rows that triggers a broadcast join for a query if the right side of the join contains fewer rows than the threshold. Avoids broadcasting too many rows to join. Range: 0-2147483647
planner.disable_exchanges	FALSE	Toggles the state of hashing to a random exchange.
planner.enable_broadcast_join	TRUE	Changes the state of aggregation and join operators. Do not disable.
planner.enable_demux_exchange	FALSE	Toggles the state of hashing to a demulitplexed exchange.
planner.enable_hash_single_key	TRUE
planner.enable_hashagg	TRUE	Enable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write to disk. Enable is recommended.
planner.enable_hashjoin	TRUE	Enable the memory hungry hash join. Does not write to disk.
planner.enable_hashjoin_swap
planner.enable_mergejoin	TRUE	Sort-based operation. Writes to disk.
planner.enable_multiphase_agg	TRUE
planner.enable_mux_exchange	TRUE	Toggles the state of hashing to a multiplexed exchange.
planner.enable_streamagg	TRUE	Sort-based operation. Writes to disk.
planner.identifier_max_length	1024
planner.join.hash_join_swap_margin_factor	10
planner.join.row_count_estimate_factor	1
planner.memory.average_field_width	8
planner.memory.enable_memory_estimation	FALSE
planner.memory.hash_agg_table_factor	1.1
planner.memory.hash_join_table_factor	1.1
planner.memory.max_query_memory_per_node	2147483648
planner.memory.non_blocking_operators_memory	64	Range: 0-2048
planner.partitioner_sender_max_threads	8
planner.partitioner_sender_set_threads	-1
planner.partitioner_sender_threads_factor	1
planner.producer_consumer_queue_size	10	How much data to prefetch from disk (in record batches) out of band of query execution
planner.slice_target	100000	The number of records manipulated within a fragment before Drill parallelizes operations.
planner.width.max_per_node	3	The maximum degree of distribution of a query across cores and cluster nodes.
planner.width.max_per_query	1000	Same as planner but applies to the query as executed by the entire cluster.
store.format	parquet	Output format for data written to tables with the CREATE TABLE AS (CTAS) command. Allowed values are parquet, json, or text. Allowed values: 0, -1, 1000000
store.json.all_text_mode	FALSE	Drill reads all data from the JSON files as VARCHAR. Prevents schema change errors.
store.mongo.all_text_mode	FALSE	Similar to store.json.all_text_mode for MongoDB.
store.parquet.block-size	536870912	Sets the size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system.
store.parquet.compression	snappy	Compression type for storing Parquet output. Allowed values: snappy, gzip, none
store.parquet.enable_dictionary_encoding	FALSE
store.parquet.use_new_reader	FALSE
window.enable	FALSE

Memory Allocation

You can configure the amount of direct memory allocated to a Drillbit for query processing. The default limit is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct memory that a Drillbit allocates to query operations cannot exceed the limit set.

Drill mainly uses Java direct memory and performs well when executing operations in memory instead of storing the operations on disk. Drill does not write to disk unless absolutely necessary, unlike MapReduce where everything is written to disk during each phase of a job.

The JVM’s heap memory does not limit the amount of direct memory available in a Drillbit. The on-heap memory for Drill is only about 4-8G, which should suffice because Drill avoids having data sit in heap memory.

Modifying Drillbit Memory

You can modify memory for each Drillbit node in your cluster. To modify the memory for a Drillbit, edit the XX:MaxDirectMemorySize parameter in the Drillbit startup script located in <drill_installation_directory>/conf/drill- env.sh.

Note: If this parameter is not set, the limit depends on the amount of available system memory.

After you edit <drill_installation_directory>/conf/drill-env.sh, restart the Drillbit on the node.