The sys.options table in Drill contains information about boot and system options listed in the following tables. To tune performance, you adjust some of the options to suit your application. Configure the options using the ALTER SESSION or ALTER SYSTEM command.
Name | Default | Comments |
---|---|---|
drill.exec.buffer.impl | "org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer" | |
drill.exec.buffer.size | 6 | Available memory in terms of record batches to hold data downstream of an operation.Increase this value to increase query speed. |
drill.exec.compile.debug | TRUE | |
drill.exec.http.enabled | TRUE | |
drill.exec.operator.packages | "org.apache.drill.exec.physical.config" | |
drill.exec.sort.external.batch.size | 4000 | |
drill.exec.sort.external.spill.directories | "/tmp/drill/spill" | Determines which directory to use for spooling |
drill.exec.sort.external.spill.group.size | 100 | |
drill.exec.storage.file.text.batch.size | 4000 | |
drill.exec.storage.packages | "org.apache.drill.exec.store" "org.apache.drill.exec.store.mock" | # This file tells Drill to consider this module when class path scanning. # This file can also include any supplementary configuration information. # This file is in HOCON format see https://github.com/typesafehub/config/blob/master/HOCON.md for more information. |
drill.exec.sys.store.provider.class | "org.apache.drill.exec.store.sys.zk.ZkPStoreProvider" | The Pstore (Persistent Configuration Storage) provider to use. The Pstore holds configuration and profile data. |
drill.exec.zk.connect | "localhost:2181" | The ZooKeeper quorum that Drill uses to connect to data sources. Configure on each Drillbit node. |
drill.exec.zk.refresh | 500 | |
file.separator | "/" | |
java.specification.version | 1.7 | |
java.vm.name | "Java HotSpot(TM) 64-Bit Server VM" | |
java.vm.specification.version | 1.7 | |
log.path | "/log/sqlline.log" | |
sun.boot.library.path | /Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/jre/lib | |
sun.java.command | "sqlline.SqlLine -d org.apache.drill.jdbc.Driver --maxWidth=10000 -u jdbc:drill:zk=local" | |
sun.os.patch.level | unknown | |
user | "" |
Name | Default | Comments |
---|---|---|
drill.exec.functions.cast_empty_string_to_null | FALSE | |
drill.exec.storage.file.partition.column.label | dir | Accepts a string input. |
exec.errors.verbose | FALSE | Toggles verbose output of executable error messages |
exec.java_compiler | DEFAULT | Switches between DEFAULT, JDK, and JANINO mode for the current session. Uses Janino by default for generated source code of less than exec.java_compiler_janino_maxsize; otherwise, switches to the JDK compiler. |
exec.java_compiler_debug | TRUE | Toggles the output of debug-level compiler error messages in runtime generated code. |
exec.java_compiler_janino_maxsize | 262144 | See the exec.java_compiler option comment. Accepts inputs of type LONG. |
exec.max_hash_table_size | 1073741824 | Ending size for hash tables. Range: 0 - 1073741824 |
exec.min_hash_table_size | 65536 | Starting size for hash tables. Increase according to available memory to improve performance. Range: 0 - 1073741824 |
exec.queue.enable | FALSE | |
exec.queue.large | 10 | Range: 0-1000 |
exec.queue.small | 100 | Range: 0-1001 |
exec.queue.threshold | 30000000 | Range: 0-9223372036854775807 |
exec.queue.timeout_millis | 300000 | Range: 0-9223372036854775807 |
planner.add_producer_consumer | FALSE | Increase prefetching of data from disk. Disable for in-memory reads. |
planner.affinity_factor | 1.2 | Accepts inputs of type DOUBLE. |
planner.broadcast_factor | 1 | |
planner.broadcast_threshold | 10000000 | Threshold in number of rows that triggers a broadcast join for a query if the right side of the join contains fewer rows than the threshold. Avoids broadcasting too many rows to join. Range: 0-2147483647 |
planner.disable_exchanges | FALSE | Toggles the state of hashing to a random exchange. |
planner.enable_broadcast_join | TRUE | Changes the state of aggregation and join operators. Do not disable. |
planner.enable_demux_exchange | FALSE | Toggles the state of hashing to a demulitplexed exchange. |
planner.enable_hash_single_key | TRUE | |
planner.enable_hashagg | TRUE | Enable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write to disk. Enable is recommended. |
planner.enable_hashjoin | TRUE | Enable the memory hungry hash join. Does not write to disk. |
planner.enable_hashjoin_swap | ||
planner.enable_mergejoin | TRUE | Sort-based operation. Writes to disk. |
planner.enable_multiphase_agg | TRUE | |
planner.enable_mux_exchange | TRUE | Toggles the state of hashing to a multiplexed exchange. |
planner.enable_streamagg | TRUE | Sort-based operation. Writes to disk. |
planner.identifier_max_length | 1024 | |
planner.join.hash_join_swap_margin_factor | 10 | |
planner.join.row_count_estimate_factor | 1 | |
planner.memory.average_field_width | 8 | |
planner.memory.enable_memory_estimation | FALSE | |
planner.memory.hash_agg_table_factor | 1.1 | |
planner.memory.hash_join_table_factor | 1.1 | |
planner.memory.max_query_memory_per_node | 2147483648 | |
planner.memory.non_blocking_operators_memory | 64 | Range: 0-2048 |
planner.partitioner_sender_max_threads | 8 | |
planner.partitioner_sender_set_threads | -1 | |
planner.partitioner_sender_threads_factor | 1 | |
planner.producer_consumer_queue_size | 10 | How much data to prefetch from disk (in record batches) out of band of query execution |
planner.slice_target | 100000 | The number of records manipulated within a fragment before Drill parallelizes operations. |
planner.width.max_per_node | 3 | The maximum degree of distribution of a query across cores and cluster nodes. |
planner.width.max_per_query | 1000 | Same as planner but applies to the query as executed by the entire cluster. |
store.format | parquet | Output format for data written to tables with the CREATE TABLE AS (CTAS) command. Allowed values are parquet, json, or text. Allowed values: 0, -1, 1000000 |
store.json.all_text_mode | FALSE | Drill reads all data from the JSON files as VARCHAR. Prevents schema change errors. |
store.mongo.all_text_mode | FALSE | Similar to store.json.all_text_mode for MongoDB. |
store.parquet.block-size | 536870912 | Sets the size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system. |
store.parquet.compression | snappy | Compression type for storing Parquet output. Allowed values: snappy, gzip, none |
store.parquet.enable_dictionary_encoding | FALSE | |
store.parquet.use_new_reader | FALSE | |
window.enable | FALSE |
You can configure the amount of direct memory allocated to a Drillbit for query processing. The default limit is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct memory that a Drillbit allocates to query operations cannot exceed the limit set.
Drill mainly uses Java direct memory and performs well when executing operations in memory instead of storing the operations on disk. Drill does not write to disk unless absolutely necessary, unlike MapReduce where everything is written to disk during each phase of a job.
The JVM’s heap memory does not limit the amount of direct memory available in a Drillbit. The on-heap memory for Drill is only about 4-8G, which should suffice because Drill avoids having data sit in heap memory.
You can modify memory for each Drillbit node in your cluster. To modify the
memory for a Drillbit, edit the XX:MaxDirectMemorySize
parameter in the
Drillbit startup script located in <drill_installation_directory>/conf/drill-
env.sh
.
Note: If this parameter is not set, the limit depends on the amount of available system memory.
After you edit <drill_installation_directory>/conf/drill-env.sh
, restart
the Drillbit
on
the node.