Configuring Resources for a Shared Drillbit

To manage a cluster in which multiple users share a Drillbit, you configure Drill queuing and parallelization in addition to memory, as described in the previous section.

Configuring Drill Query Queuing

Set options in sys.options to enable and manage query queuing, which is turned off by default. There are two types of queues: large and small. You configure a maximum number of queries that each queue allows by configuring the following options in the sys.options table:

  • exec.queue.large
  • exec.queue.small

Example Configuration

For example, you configure the queue reserved for large queries to hold a 5-query maximum. You configure the queue reserved for small queries to hold 20 queries. Users start to run queries, and Drill receives the following query requests in this order:

  • Query A (blue): 1 billion records, Drill estimates 10 million rows will be processed
  • Query B (red): 2 billion records, Drill estimates 20 million rows will be processed
  • Query C: 1 billion records
  • Query D: 100 records

The exec.queue.threshold default is 30 million, which is the estimated rows to be processed by the query. Queries A and B are queued in the large queue. The estimated rows to be processed reaches the 30 million threshold, filling the queue to capacity. The query C request arrives and goes on the wait list, and then query D arrives. Query D is queued immediately in the small queue because of its small size, as shown in the following diagram:

drill queuing

The Drill queuing configuration in this example tends to give many users running small queries a rapid response. Users running a large query might experience some delay until an earlier-received large query returns, freeing space in the large queue to process queries that are waiting.

Controlling Parallelization

By default, Drill parallelizes operations when number of records manipulated within a fragment reaches 100,000. When parallelization of operations is high, the cluster operates as fast as possible, which is fine for a single user. In a contentious multi-tenant situation, however, you need to reduce parallelization to levels based on user needs.

Parallelization Configuration Procedure

To configure parallelization, configure the following options in the sys.options table:

  • planner.width.max.per.node
    The maximum degree of distribution of a query across cores and cluster nodes.
  • planner.width.max.per.query
    Same as max per node but applies to the query as executed by the entire cluster.

Configure the planner.width.max.per.node to achieve fine grained, absolute control over parallelization.

Data Isolation

Tenants can share data on a cluster using Drill views and impersonation. ??Link to impersonation doc.??