//// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// Troubleshooting --------------- General Troubleshooting Process ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following steps should be followed to troubleshoot any failure that you encounter while running Sqoop. - Turn on verbose output by executing the same command again and specifying the +\--verbose+ option. This produces more debug output on the console which can be inspected to identify any obvious errors. - Look at the task logs from Hadoop to see if there are any specific failures recorded there. It is possible that the failure that occurs while task execution is not relayed correctly to the console. - Make sure that the necessary input files or input/output tables are present and can be accessed by the user that Sqoop is executing as or connecting to the database as. It is possible that the necessary files or tables are present but the specific user that Sqoop connects as does not have the necessary permissions to access these files. - If you are doing a compound action such as populating a Hive table or partition, try breaking the job into two separate actions to see where the problem really occurs. For example if an import that creates and populates a Hive table is failing, you can break it down into two steps - first for doing the import alone, and the second to create a Hive table without the import using the +create-hive-table+ tool. While this does not address the original use-case of populating the Hive table, it does help narrow down the problem to either regular import or during the creation and population of Hive table. - Search the mailing lists archives and JIRA for keywords relating to the problem. It is possible that you may find a solution discussed there that will help you solve or work-around your problem. Specific Troubleshooting Tips ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Oracle: Connection Reset Errors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Problem: When using the default Sqoop connector for Oracle, some data does get transferred, but during the map-reduce job a lot of errors are reported as below: ---- 11/05/26 16:23:47 INFO mapred.JobClient: Task Id : attempt_201105261333_0002_m_000002_0, Status : FAILED java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset at com.cloudera.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:190) at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:159) ... 9 more Caused by: java.sql.SQLRecoverableException: IO Error: Connection reset at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:428) at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:536) at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:228) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:185) at com.cloudera.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:152) at com.cloudera.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:184) ... 10 more Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at oracle.net.ns.DataPacket.send(DataPacket.java:199) at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:211) at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:227) at oracle.net.ns.NetInputStream.read(NetInputStream.java:175) at oracle.net.ns.NetInputStream.read(NetInputStream.java:100) at oracle.net.ns.NetInputStream.read(NetInputStream.java:85) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79) at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1122) at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1099) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:288) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191) at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:366) at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:752) at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:366) ... 18 more ---- Solution: This problem occurs primarily due to the lack of a fast random number generation device on the host where the map tasks execute. On typical Linux systems this can be addressed by setting the following property in the +java.security+ file: ---- java.security.egd=file:/dev/../dev/urandom ---- The +java.security+ file can be found under +$JAVA_HOME/jre/lib/security+ directory. Alternatively, this property can also be specified on the command line via: ---- -D mapred.child.java.opts="\-Djava.security.egd=file:/dev/../dev/urandom"+ ---- Oracle: Case-Sensitive Catalog Query Errors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Problem: While working with Oracle you may encounter problems when Sqoop can not figure out column names. This happens because the catalog queries that Sqoop uses for Oracle expect the correct case to be specified for the user name and table name. One example, using --hive-import and resulting in a NullPointerException: ---- 1/09/21 17:18:49 INFO manager.OracleManager: Time zone has been set to GMT 11/09/21 17:18:49 DEBUG manager.SqlManager: Using fetchSize for next query: 1000 11/09/21 17:18:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM addlabel_pris t WHERE 1=0 11/09/21 17:18:49 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin: 11/09/21 17:18:49 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148) at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187) at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362) at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423) at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237) ---- .Solution: . Specify the user name, which Sqoop is connecting as, in upper case (unless it was created with mixed/lower case within quotes). . Specify the table name, which you are working with, in upper case (unless it was created with mixed/lower case within quotes). MySQL: Connection Failure ^^^^^^^^^^^^^^^^^^^^^^^^^ Problem: While importing a MySQL table into Sqoop, if you do not have the necessary permissions to access your MySQL database over the network, you may get the below connection failure. ---- Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure ---- Solution: First, verify that you can connect to the database from the node where you are running Sqoop: ---- $ mysql --host= --database=test --user= --password= ---- If this works, it rules out any problem with the client network configuration or security/authentication configuration. Add the network port for the server to your my.cnf file +/etc/my.cnf+: ---- [mysqld] port = xxxx ---- Set up a user account to connect via Sqoop. Grant permissions to the user to access the database over the network: (1.) Log into MySQL as root +mysql -u root -p+. (2.) Issue the following command: ---- mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' ---- Note that doing this will enable the testuser to connect to the MySQL server from any IP address. While this will work, it is not advisable for a production environment. We advise consulting with your DBA to grant the necessary privileges based on the setup topology. If the database server's IP address changes, unless it is bound to a static hostname in your server, the connect string passed into Sqoop will also need to be changed. Oracle: ORA-00933 error (SQL command not properly ended) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Problem: While working with Oracle you may encounter the below problem when the Sqoop command explicitly specifies the --driver option. When the driver option is included in the Sqoop command, the built-in connection manager selection defaults to the generic connection manager, which causes this issue with Oracle. If the driver option is not specified, the built-in connection manager selection mechanism selects the Oracle specific connection manager which generates valid SQL for Oracle and uses the driver "oracle.jdbc.OracleDriver". ---- ERROR manager.SqlManager: Error executing statement: java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended ---- Solution: Omit the option --driver oracle.jdbc.driver.OracleDriver and then re-run the Sqoop command. MySQL: Import of TINYINT(1) from MySQL behaves strangely ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Problem: Sqoop is treating TINYINT(1) columns as booleans, which is for example causing issues with HIVE import. This is because by default the MySQL JDBC connector maps the TINYINT(1) to java.sql.Types.BIT, which Sqoop by default maps to Boolean. Solution: A more clean solution is to force MySQL JDBC Connector to stop converting TINYINT(1) to java.sql.Types.BIT by adding +tinyInt1isBit=false+ into your JDBC path (to create something like +jdbc:mysql://localhost/test?tinyInt1isBit=false+). Another solution would be to explicitly override the column mapping for the datatype TINYINT(1) column. For example, if the column name is foo, then pass the following option to Sqoop during import: --map-column-hive foo=tinyint. In the case of non-Hive imports to HDFS, use --map-column-java foo=integer.