//// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// Automatic Full-database Import ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to import all the tables in a database, you can use the +--all-tables+ command to do so: ---- $ sqoop --connect jdbc:mysql://database.example.com/employees --all-tables ---- This will query the database for the available tables, generate an ORM class for each table, and run a MapReduce job to import each one. Hadoop uses the DBInputFormat to read from a database into a Mapper instance. To read a table into a MapReduce program requires creating a class to hold the fields of one row of the table. One of the benefits of Sqoop is that it generates this class definition for you, based on the table definition in the database. The generated +.java+ files are, by default, placed in the current directory. You can supply a different directory with the +--outdir+ parameter. These are then compiled into +.class+ and +.jar+ files for use by the MapReduce job that it launches. These files are created in a temporary directory. You can redirect this target with +--bindir+. Each table will be imported into a separate directory in HDFS, with the same name as the table. For instance, if my Hadoop username is aaron, the above command would have generated the following directories in HDFS: ---- /user/aaron/employee_names /user/aaron/payroll_checks /user/aaron/job_descriptions /user/aaron/office_supplies ---- You can change the base directory under which the tables are loaded with the +--warehouse-dir+ parameter. For example: ---- $ sqoop --connect jdbc:mysql://database.example.com/employees --all-tables \ --warehouse-dir /common/warehouse ---- This would create the following directories instead: ---- /common/warehouse/employee_names /common/warehouse/payroll_checks /common/warehouse/job_descriptions /common/warehouse/office_supplies ---- By default the data will be read into text files in HDFS. Each of the columns will be represented as comma-delimited text. Each row is terminated by a newline. See the section on "Controlling the Output Format" below for information on how to change these delimiters. If you want to leverage compression and binary file formats, the +--as-sequencefile+ argument to Sqoop will import the table to a set of SequenceFiles instead. This stores each field of each database record in a separate object in a SequenceFile. This representation is also likely to be higher performance when used as an input to subsequent MapReduce programs as it does not require parsing. For completeness, Sqoop provides an +--as-textfile+ option, which is implied by default. An +--as-textfile+ on the command-line will override a previous +--as-sequencefile+ argument. The SequenceFile format will embed the records from the database as objects using the code generated by Sqoop. It is important that you retain the +.java+ file for this class, as you will need to be able to instantiate the same type to read the objects back later, in other user-defined applications.