Workspaces

When you register an instance of a file system data source, you can configure one or more workspaces for the instance. A workspace is a directory within the file system that you define. Drill searches the workspace to locate data when you run a query.

Each workspace that you register defines a schema that you can connect to and query. Configuring workspaces is useful when you want to run multiple queries on files or tables in a specific directory. You cannot create workspaces for hive and hbase instances, though Hive databases show up as workspaces in Drill.

The following example shows an instance of a file type storage plugin with a workspace named json configured to point Drill to the /users/max/drill/json/ directory in the local file system (dfs):

{
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "workspaces" : {
    "json" : {
      "location" : "/users/max/drill/json/",
      "writable" : false,
      "defaultinputformat" : json
   } 
},

Note

The `connection` parameter in the configuration above is "`file:///`", connecting Drill to the local file system (`dfs`).

To connect to a Hadoop or MapR file system the connection parameter would be "hdfs:///"or"maprfs:///",respectively.

To query a file in the example json workspace, you can issue the USE command to tell Drill to use the json workspace configured in the dfs instance for each query that you issue:

Example

USE dfs.json;
SELECT * FROM dfs.json.`donuts.json` WHERE type='frosted'

If the json workspace did not exist, the query would have to include the full path to the donuts.json file:

SELECT * FROM dfs.`/users/max/drill/json/donuts.json` WHERE type='frosted';

Using a workspace alleviates the need to repeatedly enter the directory path in subsequent queries on the directory.

Default Workspaces

Each file and hive instance includes a default workspace. The default workspace points to the file system or to the Hive metastore. When you query files and tables in the file or hive default workspaces, you can omit the workspace name from the query.

For example, you can issue a query on a Hive table in the default workspace using either of the following formats and get the the same results:

Example

SELECT * FROM hive.customers LIMIT 10;
SELECT * FROM hive.`default`.customers LIMIT 10;

Note

Default is a reserved word. You must enclose reserved words in back ticks.

Because HBase instances do not have workspaces, you can use the following format to query a table in HBase:

SELECT * FROM hbase.customers LIMIT 10;

After you register a data source as a storage plugin instance with Drill, and optionally configure workspaces, you can query the data source.