Drill provides low latency SQL queries on large-scale datasets. Example use cases for Drill include
We expect Drill to be used in lot more use cases where low latency is required.
Drill complements batch-processing frameworks such as Hive, Pig, MapReduce to support low latency queries. Drill at this point doesn't make an optimal choice for OLTP/operational applications that require sub-second response times.
Drill takes a different approach to SQL-on-Hadoop than Hive and other related technologies. The goal for Drill is to bring the SQL ecosystem and performance of the relational systems to Hadoop-scale data without compromising on the flexibility of Hadoop/NoSQL systems. Drill provides a flexible query environment for users with the key capabilities as below.
Self-describing data is where schema is specified as part of the data itself. File formats such as Parquet, JSON, ProtoBuf, XML, AVRO and NoSQL databases are all examples of self-describing data. Some of these data formats also dynamic and complex in that every record in the data can have its own set of columns/attributes and each column can be semi-structured/nested.
Drill enables queries on self-describing data using the fundamental architectural foundations as below.
Together with the dynamic data discovery and a flexible data model that can handle complex data types, Drill allows users to get fast and complete value from all their data.
Yes, Hive also serves as data source for Drill. So you can simply point to the Hive metastore from Drill and start performing low latency queries on Hive tables with no modifications.
Of course not! Central EDW schemas work great if data models are not changing often, value of data is well understood and is ready to be operationalized for regular reporting purposes. However, during data exploration and discovery phase, rigid modeling requirement poses challenges and delays value from data, especially in the Hadoop/NoSQL environments where the data is highly complex, dynamic and evolving fast. Few challenges include
Drill is all about flexibility. The flexible schema management capabilities in Drill lets users explore the data in its native format as it comes in directly and create models/structure if needed in Hive metastore or using the CREATE TABLE/CREATE VIEW syntax within Drill.
Drill uses a de-centralized metadata model and relies on its storage plugins to provide with the metadata. Drill supports queries on file system (distributed and local), HBase and Hive tables. There is a storage plugin associated with each data source that is supported by Drill.
Here is the anatomy of a Drill query.
Yes, Drill provides JDBC/ODBC drivers for integrating with BI/SQL based tools.
Drill provides ANSI standard SQL (not SQL "Like" or Hive QL) with support for all key analytics functionality such as SQL data types, joins, aggregations, filters, sort, sub-queries (including correlated), joins in where clause etc. Click here for reference on SQL functionality in Drill.
Drill is not designed with a particular Hadoop distribution in mind and we expect it to work with all Hadoop distributions that support Hadoop 2.3.x file client API. We have validated it so far with Apache Hadoop/MapR/CDH and Amazon EMR* distributions.
* Custom configuration required. Please contact drill-user@incubator.apache.org for questions
Drill is built from the ground up for performance on large-scale datasets. The key architectural components that help in achieving performance include.
Drill is built to support several 100s of queries at any given point. Clients can submit requests to any node running Drillbit service in the cluster (no master-slave concept). To support more users, you simply have to add more nodes to the cluster.
No. Drill can query data "in situ".
The best way to get started is to just try it out. It just takes a few minutes even if you do not have a cluster. Here is a good place to start - Apache Drill in 10 minutes.
Please post your questions and feedback on drill-user@incubator.apache.org. We are happy to have you try out Drill and help with any questions!
Please refer to the Get Involved page on how to get involved with Drill.
Here is how you can contribute.
Please contact drill-dev@incubator.apache.org for any questions.