//// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// Direct-mode Imports ------------------- While the JDBC-based import method used by Sqoop provides it with the ability to read from a variety of databases using a generic driver, it is not the most high-performance method available. Sqoop can read from certain database systems faster by using their built-in export tools. For example, Sqoop can read from a MySQL database by using the +mysqldump+ tool distributed with MySQL. You can take advantage of this faster import method by running Sqoop with the +--direct+ argument. This combined with a connect string that begins with +jdbc:mysql://+ will inform Sqoop that it should select the faster access method. If your delimiters exactly match the delimiters used by +mysqldump+, then Sqoop will use a fast-path that copies the data directly from +mysqldump+'s output into HDFS. Otherwise, Sqoop will parse +mysqldump+'s output into fields and transcode them into the user-specified delimiter set. This incurs additional processing, so performance may suffer. For convenience, the +--mysql-delimiters+ argument will set all the output delimiters to be consistent with +mysqldump+'s format. Sqoop also provides a direct-mode backend for PostgreSQL that uses the +COPY TO STDOUT+ protocol from +psql+. No specific delimiter set provides better performance; Sqoop will forward delimiter control arguments to +psql+. The "Supported Databases" section provides a full list of database vendors which have direct-mode support from Sqoop. When writing to HDFS, direct mode will open a single output file to receive the results of the import. You can instruct Sqoop to use multiple output files by using the +--direct-split-size+ argument which takes a size in bytes. Sqoop will generate files of approximately this size. e.g., +--direct-split-size 1000000+ will generate files of approximately 1 MB each. If compressing the HDFS files with +--compress+, this will allow subsequent MapReduce programs to use multiple mappers across your data in parallel. Tool-specific arguments ~~~~~~~~~~~~~~~~~~~~~~~ Sqoop will generate a set of command-line arguments with which it invokes the underlying direct-mode tool (e.g., mysqldump). You can specify additional arguments which should be passed to the tool by passing them to Sqoop after a single '+-+' argument. e.g.: ---- $ sqoop --connect jdbc:mysql://localhost/db --table foo --direct - --lock-tables ---- The +--lock-tables+ argument (and anything else to the right of the +-+ argument) will be passed directly to mysqldump.