Storage Plugin Configuration Introduction

When you add or update storage plugin instances on one Drill node in a Drill cluster, Drill broadcasts the information to all of the other Drill nodes to have identical storage plugin configurations. You do not need to restart any of the Drillbits when you add or update a storage plugin instance.

Use the Drill Web UI to update or add a new storage plugin. Launch a web browser, go to: http://<IP address of the sandbox>:8047, and then go to the Storage tab.

To create and configure a new storage plugin:

  1. Enter a storage name in New Storage Plugin. Each storage plugin registered with Drill must have a distinct name. Names are case-sensitive.
  2. Click Create.
  3. In Configuration, configure attributes of the storage plugin, if applicable, using JSON formatting. The Storage Plugin Attributes table in the next section describes attributes typically reconfigured by users.
  4. Click Create.

Click Update to reconfigure an existing, enabled storage plugin.

Storage Plugin Attributes

The following diagram of the dfs storage plugin briefly describes options you configure in a typical storage plugin configuration:

dfs plugin

The following table describes the attributes you configure for storage plugins in more detail than the diagram.

Attribute Example Values Required Description
"type" "file"
"hbase"
"hive"
"mongo"
yes The storage plugin type name supported by Drill.
"enabled" true
false
yes The state of the storage plugin.
"connection" "classpath:///"
"file:///"
"mongodb://localhost:27017/"
"maprfs:///"
implementation-dependent The type of distributed file system. Drill can work with any distributed system, such as HDFS and S3, or files in your file system.
"workspaces" null
"logs"
no One or more unique workspace names, enclosed in double quotation marks. If a workspace is defined more than once, the latest one overrides the previous ones. Not used with local or distributed file systems.
"workspaces". . . "location" "location": "/"
"location": "/tmp"
no The path to a directory on the file system.
"workspaces". . . "writable" true
false
no One or more unique workspace names, enclosed in double quotation marks. If a workspace is defined more than once, the latest one overrides the previous ones. Not used with local or distributed file systems.
"workspaces". . . "defaultInputFormat" null
"parquet"
"csv"
"json"
no The format of data Drill reads by default, regardless of extension. Parquet is the default.
"formats" "psv"
"csv"
"tsv"
"parquet"
"json"
"maprdb"
yes One or more file formats of data Drill can read. Drill can implicitly detect some file formats based on the file extension or the first few bits of data within the file, but you need to configure an option for others.
"formats" . . . "type" "text"
"parquet"
"json"
"maprdb"
yes The type of the format specified. For example, you can define two formats, csv and psv, as type "Text", but having different delimiters. Drill enables the maprdb plugin if you define the maprdb type.
formats . . . "extensions" ["csv"] format-dependent The extensions of the files that Drill can read.
"formats" . . . "delimiter" "\t"
","
format-dependent The delimiter used to separate columns in text files such as CSV. Specify a non-printable delimiter in the storage plugin config by using the form \uXXXX, where XXXX is the four numeral hex ascii code for the character.

The configuration of other attributes, such as size.calculator.enabled in the hbase plugin and configProps in the hive plugin, are implementation-dependent and beyond the scope of this document.

Although Drill can work with different file types in the same directory, restricting a Drill workspace to one file type prevents confusion.

Case-sensitive Names

As previously mentioned, workspace and storage plugin names are case-sensitive. For example, the following query uses a storage plugin name dfs and a workspace name clicks. When you refer to dfs.clicks in an SQL statement, use the defined case:

0: jdbc:drill:> USE dfs.clicks;

For example, using uppercase letters in the query after defining the storage plugin and workspace names using lowercase letters does not work.

REST API

Drill provides a REST API that you can use to create a storage plugin. Use an HTTP POST and pass two properties:

  • name The plugin name.

  • config The storage plugin definition as you would enter it in the Web UI.

For example, this command creates a plugin named myplugin for reading files of an unknown type located on the root of the file system:

curl -X POST -/json" -d '{"name":"myplugin", "config": {"type": "file", "enabled": false, "connection": "file:///", "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": null}}, "formats": null}}' http://localhost:8047/storage/myplugin.json

Bootstrapping a Storage Plugin

If you need to add a storage plugin to Drill and do not want to use a web browser, you can create a bootstrap-storage-plugins.json file and include it on the classpath when starting Drill. The storage plugin loads when Drill starts up.

If you configure an HBase storage plugin using bootstrap-storage-plugins.json file and HBase is not install, you might experience a delay when executing the queries. Configure the HBase client timeout and retry settings in the config block of HBase plugin instance configuration.