Query Data Introduction
You can query local and distributed file systems, Hive, and HBase data sources
registered with Drill. If you connect directly to a particular schema when
you invoke SQLLine, you can issue SQL queries against that schema. If you d0
not indicate a schema when you invoke SQLLine, you can issue the USE
<schema>
statement to run your queries against a particular schema. After you
issue the USE
statement, you can use absolute notation, such as schema.table.column
.
You may need to use casting functions in some queries. For example, you may
have to cast a string "100"
to an integer in order to apply a math function
or an aggregate function.
You can use the EXPLAIN command to analyze errors and troubleshoot queries that do not run. For example, if you run into a casting error, the query plan text may help you isolate the problem.
0: jdbc:drill:zk=local> !set maxwidth 10000
0: jdbc:drill:zk=local> explain plan for select ... ;
The set command increases the default text display (number of characters). By default, most of the plan output is hidden.
You may see errors if you try to use non-standard or unsupported SQL syntax in a query.
Remember the following tips when querying data with Drill:
- Include a semicolon at the end of SQL statements, except when you issue a command with an exclamation point
(!).
Example:!set maxwidth 10000
Use backticks around file and directory names that contain special characters and also around reserved words when you query a file system.
The following special characters require backticks:- . (period)
- / (forward slash)
- _ (underscore)
Example:
SELECT * FROM dfs.default.`sample_data/my_sample.json`;
CAST
data toVARCHAR
if an expression in a query returnsVARBINARY
as the result type in order to view theVARBINARY
types as readable data. If you do not use theCAST
function, Drill returns the results as byte data.
Example:CAST (VARBINARY_expr as VARCHAR(50))
When selecting all (SELECT *) schema-less data, the order of returned columns might differ from the stored order and might vary from query to query.