Connect to a Data Source

A storage plugin is an interface for connecting to a data source to read and write data. Apache Drill connects to a data source, such as a file on the file system or a Hive metastore, through a storage plugin. When you execute a query, Drill gets the plugin name you provide in FROM clause of your query or from the default you specify in the USE. command that precedes the query. .

In addition to the connection string, the storage plugin configures the workspace and file formats for reading data, as described in subsequent sections.

Storage Plugins Internals

The following image represents the storage plugin layer between Drill and a data source:

drill query flow

A storage plugin provides the following information to Drill:

A storage plugin performs scanner and writer functions, and informs the metadata repository of any known metadata. The metadata repository is a database created to store metadata. The metadata is information about the structures that contain the actual data, such as:

A storage plugin informs the execution engine of any native capabilities, such as predicate pushdown, joins, and SQL.