//// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// Connecting to a Database Server ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sqoop is designed to import tables from a database into HDFS. To do so, you must specify a _connect string_ that describes how to connect to the database. The _connect string_ is similar to a URL, and is communicated to Sqoop with the +\--connect+ argument. This describes the server and database to connect to; it may also specify the port. For example: ---- $ sqoop import --connect jdbc:mysql://database.example.com/employees ---- This string will connect to a MySQL database named +employees+ on the host +database.example.com+. It's important that you *do not* use the URL +localhost+ if you intend to use Sqoop with a distributed Hadoop cluster. The connect string you supply will be used on TaskTracker nodes throughout your MapReduce cluster; if you specify the literal name +localhost+, each node will connect to a different database (or more likely, no database at all). Instead, you should use the full hostname or IP address of the database host that can be seen by all your remote nodes. You might need to authenticate against the database before you can access it. You can use the +\--username+ and +\--password+ or +-P+ parameters to supply a username and a password to the database. For example: ---- $ sqoop import --connect jdbc:mysql://database.example.com/employees \ --username aaron --password 12345 ---- .Password security WARNING: The +\--password+ parameter is insecure, as other users may be able to read your password from the command-line arguments via the output of programs such as `ps`. The *+-P+* argument will read a password from a console prompt, and is the preferred method of entering credentials. Credentials may still be transferred between nodes of the MapReduce cluster using insecure means. Sqoop automatically supports several databases, including MySQL. Connect strings beginning with +jdbc:mysql://+ are handled automatically in Sqoop. (A full list of databases with built-in support is provided in the "Supported Databases" section. For some, you may need to install the JDBC driver yourself.) You can use Sqoop with any other JDBC-compliant database. First, download the appropriate JDBC driver for the type of database you want to import, and install the .jar file in the +$SQOOP_HOME/lib+ directory on your client machine. (This will be +/usr/lib/sqoop/lib+ if you installed from an RPM or Debian package.) Each driver +.jar+ file also has a specific driver class which defines the entry-point to the driver. For example, MySQL's Connector/J library has a driver class of +com.mysql.jdbc.Driver+. Refer to your database vendor-specific documentation to determine the main driver class. This class must be provided as an argument to Sqoop with +\--driver+. For example, to connect to a SQLServer database, first download the driver from microsoft.com and install it in your Sqoop lib path. Then run Sqoop. For example: ---- $ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \ --connect ... ---- When connecting to a database using JDBC, you can optionally specify extra JDBC parameters via a property file using the option +\--connection-param-file+. The contents of this file are parsed as standard Java properties and passed into the driver while creating a connection. NOTE: The parameters specified via the optional property file are only applicable to JDBC connections. Any fastpath connectors that use connections other than JDBC will ignore these parameters.