sqoop(1) ======== //// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// NAME ---- sqoop - SQL-to-Hadoop import tool SYNOPSIS -------- 'sqoop' DESCRIPTION ----------- Sqoop is a tool designed to help users of large data import existing relational databases into their Hadoop clusters. Sqoop uses JDBC to connect to a database, examine each table's schema, and auto-generate the necessary classes to import data into HDFS. It then instantiates a MapReduce job to read tables from the database via the DBInputFormat (JDBC-based InputFormat). Tables are read into a set of files loaded into HDFS. Both SequenceFile and text-based targets are supported. Sqoop also supports high-performance imports from select databases including MySQL. OPTIONS ------- The +--connect+ option is always required. To perform an import, one of +--table+ or +--all-tables+ is required as well. Alternatively, you can specify +--generate-only+ or one of the arguments in "Additional commands." Database connection options ~~~~~~~~~~~~~~~~~~~~~~~~~~~ --connect (jdbc-uri):: Specify JDBC connect string (required) --driver (class-name):: Manually specify JDBC driver class to use --username (username):: Set authentication username --password (password):: Set authentication password (Note: This is very insecure. You should use -P instead.) -P:: Prompt for user password --direct:: Use direct import fast path (mysql only) Import control options ~~~~~~~~~~~~~~~~~~~~~~ --all-tables:: Import all tables in database (Ignores +--table+, +--columns+, +--order-by+, and +--where+) --columns (col,col,col...):: Columns to export from table --split-by (column-name):: Column of the table used to split the table for parallel import --hadoop-home (dir):: Override $HADOOP_HOME --hive-home (dir):: Override $HIVE_HOME --warehouse-dir (dir):: Tables are uploaded to the HDFS path +/warehouse/dir/(tablename)/+ --as-sequencefile:: Imports data to SequenceFiles --as-textfile:: Imports data as plain text (default) --hive-import:: If set, then import the table into Hive --table (table-name):: The table to import --where (clause):: Import only the rows for which _clause_ is true. e.g.: `--where "user_id > 400 AND hidden == 0"` --compress:: -z:: Uses gzip to compress data as it is written to HDFS --direct-split-size (size):: When using direct mode, write to multiple files of approximately _size_ bytes each. Export control options ~~~~~~~~~~~~~~~~~~~~~~ --export-dir (dir):: Export from an HDFS path into a table (set with --table) Output line formatting options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include::output-formatting.txt[] include::output-formatting-args.txt[] Input line parsing options ~~~~~~~~~~~~~~~~~~~~~~~~~~ include::input-formatting.txt[] include::input-formatting-args.txt[] Code generation options ~~~~~~~~~~~~~~~~~~~~~~~ --bindir (dir):: Output directory for compiled objects --class-name (name):: Sets the name of the class to generate. By default, classes are named after the table they represent. Using this parameters ignores +--package-name+. --generate-only:: Stop after code generation; do not import --outdir (dir):: Output directory for generated code --package-name (package):: Puts auto-generated classes in the named Java package Library loading options ~~~~~~~~~~~~~~~~~~~~~~~ --jar-file (file):: Disable code generation; use specified jar --class-name (name):: The class within the jar that represents the table to import/export Additional commands ~~~~~~~~~~~~~~~~~~~ These commands cause Sqoop to report information and exit; no import or code generation is performed. --debug-sql (statement):: Execute 'statement' in SQL and display the results --help:: Display usage information and exit --list-databases:: List all databases available and exit --list-tables:: List tables in database and exit Database-specific options ~~~~~~~~~~~~~~~~~~~~~~~~~ Additional arguments may be passed to the database manager after a lone '-' on the command-line. In MySQL direct mode, additional arguments are passed directly to mysqldump. ENVIRONMENT ----------- JAVA_HOME:: As part of its import process, Sqoop generates and compiles Java code by invoking the Java compiler *javac*(1). As a result, JAVA_HOME must be set to the location of your JDK (note: This cannot just be a JRE). e.g., +/usr/java/default+. Hadoop (and Sqoop) requires Sun Java 1.6 which can be downloaded from http://java.sun.com. HADOOP_HOME:: The location of the Hadoop jar files. If you installed Hadoop via RPM or DEB, these are in +/usr/lib/hadoop-20+. HIVE_HOME:: If you are performing a Hive import, you must identify the location of Hive's jars and configuration. If you installed Hive via RPM or DEB, these are in +/usr/lib/hive+.