//// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// Sqoop Tools ----------- Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool. If Sqoop is compiled from its own source, you can run Sqoop without a formal installation process by running the +bin/sqoop+ program. Users of a packaged deployment of Sqoop (such as an RPM shipped with Cloudera's Distribution for Hadoop) will see this program installed as +/usr/bin/sqoop+. The remainder of this documentation will refer to this program as +sqoop+. For example: ---- $ sqoop tool-name [tool-arguments] ---- NOTE: The following examples that begin with a +$+ character indicate that the commands must be entered at a terminal prompt (such as +bash+). The +$+ character represents the prompt itself; you should not start these commands by typing a +$+. You can also enter commands inline in the text of a paragraph; for example, +sqoop help+. These examples do not show a +$+ prefix, but you should enter them the same way. Don't confuse the +$+ shell prompt in the examples with the +$+ that precedes an environment variable name. For example, the string literal +$HADOOP_HOME+ includes a "+$+". Sqoop ships with a help tool. To display a list of all available tools, type the following command: ---- $ sqoop help usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS list-databases List available databases on a server list-tables List available tables in a database version Display version information See 'sqoop help COMMAND' for information on a specific command. ---- You can display help for a specific tool by entering: +sqoop help (tool-name)+; for example, +sqoop help import+. You can also add the +\--help+ argument to any command: +sqoop import \--help+. Using Command Aliases ~~~~~~~~~~~~~~~~~~~~~ In addition to typing the +sqoop (toolname)+ syntax, you can use alias scripts that specify the +sqoop-(toolname)+ syntax. For example, the scripts +sqoop-import+, +sqoop-export+, etc. each select a specific tool. Controlling the Hadoop Installation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You invoke Sqoop through the program launch capability provided by Hadoop. The +sqoop+ command-line program is a wrapper which runs the +bin/hadoop+ script shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the +$HADOOP_HOME+ environment variable. For example: ---- $ HADOOP_HOME=/path/to/some/hadoop sqoop import --arguments... ---- or: ---- $ export HADOOP_HOME=/some/path/to/hadoop $ sqoop import --arguments... ----- If +$HADOOP_HOME+ is not set, Sqoop will use the default installation location for Cloudera's Distribution for Hadoop, +/usr/lib/hadoop+. The active Hadoop configuration is loaded from +$HADOOP_HOME/conf/+, unless the +$HADOOP_CONF_DIR+ environment variable is set. Using Generic and Specific Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To control the operation of each Sqoop tool, you use generic and specific arguments. For example: ---- $ sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect Specify JDBC connect string --connect-manager Specify connection manager class to use --driver Manually specify JDBC driver class to use --hadoop-home Override $HADOOP_HOME --help Print usage instructions -P Read password from console --password Set authentication password --username Set authentication username --verbose Print more information while working [...] Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a job tracker -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] ---- You must supply the generic arguments +-conf+, +-D+, and so on after the tool name but *before* any tool-specific arguments (such as +\--connect+). Note that generic Hadoop arguments are preceeded by a single dash character (+-+), whereas tool-specific arguments start with two dashes (+\--+), unless they are single character arguments such as +-P+. The +-conf+, +-D+, +-fs+ and +-jt+ arguments control the configuration and Hadoop server settings. For example, the +-D mapred.job.name=+ can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name. The +-files+, +-libjars+, and +-archives+ arguments are not typically used with Sqoop, but they are included as part of Hadoop's internal argument-parsing system. Using Options Files to Pass Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When using Sqoop, the command line options that do not change from invocation to invocation can be put in an options file for convenience. An options file is a text file where each line identifies an option in the order that it appears otherwise on the command line. Option files allow specifying a single option on multiple lines by using the back-slash character at the end of intermediate lines. Also supported are comments within option files that begin with the hash character. Comments must be specified on a new line and may not be mixed with option text. All comments and empty lines are ignored when option files are expanded. Unless options appear as quoted strings, any leading or trailing spaces are ignored. Quoted strings if used must not extend beyond the line on which they are specified. Option files can be specified anywhere in the command line as long as the options within them follow the otherwise prescribed rules of options ordering. For instance, regardless of where the options are loaded from, they must follow the ordering such that generic options appear first, tool specific options next, finally followed by options that are intended to be passed to child programs. To specify an options file, simply create an options file in a convenient location and pass it to the command line via +\--options-file+ argument. Whenever an options file is specified, it is expanded on the command line before the tool is invoked. You can specify more than one option files within the same invocation if needed. For example, the following Sqoop invocation for import can be specified alternatively as shown below: ---- $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST $ sqoop --options-file /users/homer/work/import.txt --table TEST ---- where the options file +/users/homer/work/import.txt+ contains the following: ---- import --connect jdbc:mysql://localhost/db --username foo ---- The options file can have empty lines and comments for readability purposes. So the above example would work exactly the same if the options file +/users/homer/work/import.txt+ contained the following: ---- # # Options file for Sqoop import # # Specifies the tool being invoked import # Connect parameter and value --connect jdbc:mysql://localhost/db # Username parameter and value --username foo # # Remaining options should be specified in the command line. # ---- Using Tools ~~~~~~~~~~~ The following sections will describe each tool's operation. The tools are listed in the most likely order you will find them useful.