本帖最后由 hyj 于 2013-12-25 20:15 编辑
一、首先需要环境准备 :
准备好演示环境后启动 hadoop 环境使用 JPS 查看到的进程如下 , 说明启动正常:
[hadoop@domain bin]$ jps
2658 NameNode
3064 TaskTracker
5082 Jps
2760 DataNode
2866 SecondaryNameNode
2939 JobTracker</P> 复制代码
先使用sqoop 查看 oracle ( scott/tiger )数据库表 正确 使用 sqoop 语句导入 scott 用户指定的表到 HIVE 中 在这特别要注意的是 SCOTT 在语句中必须为大写,否则在 导数的时候出错: ERROR tool.ImportTool: ImportedFailed: Attempted to generate class with no columns! 执行后的语句信息如下:[hadoop@domain bin]$ sqoop import -connect jdbc:oracle:thin:@192.168.1.2:1521:YONYOU -username SCOTT -password tiger --table EMP --hive-import -m 1
13/03/30 00:40:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/03/30 00:40:23 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
13/03/30 00:40:23 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
13/03/30 00:40:23 INFO manager.SqlManager: Using default fetchSize of 1000
13/03/30 00:40:24 INFO tool.CodeGenTool: Beginning code generation
13/03/30 00:40:25 INFO manager.OracleManager: Time zone has been set to GMT
13/03/30 00:40:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EMP t WHERE 1=0
13/03/30 00:40:25 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2
13/03/30 00:40:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/57b0c91f97a25b025cfcf0729f316f39/EMP.jar
13/03/30 00:40:28 INFO manager.OracleManager: Time zone has been set to GMT
13/03/30 00:40:28 INFO manager.OracleManager: Time zone has been set to GMT
13/03/30 00:40:28 INFO mapreduce.ImportJobBase: Beginning import of EMP
13/03/30 00:40:28 INFO manager.OracleManager: Time zone has been set to GMT
13/03/30 00:40:32 INFO mapred.JobClient: Running job: job_201303292217_0004
13/03/30 00:40:33 INFO mapred.JobClient: map 0% reduce 0%
13/03/30 00:40:59 INFO mapred.JobClient: map 100% reduce 0%
13/03/30 00:41:02 INFO mapred.JobClient: Job complete: job_201303292217_0004
13/03/30 00:41:02 INFO mapred.JobClient: Counters: 16
13/03/30 00:41:02 INFO mapred.JobClient: Job Counters
13/03/30 00:41:02 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=27161
13/03/30 00:41:02 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/03/30 00:41:02 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/03/30 00:41:02 INFO mapred.JobClient: Launched map tasks=1
13/03/30 00:41:02 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/03/30 00:41:02 INFO mapred.JobClient: FileSystemCounters
13/03/30 00:41:02 INFO mapred.JobClient: HDFS_BYTES_READ=87
13/03/30 00:41:02 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65258
13/03/30 00:41:02 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=701
13/03/30 00:41:02 INFO mapred.JobClient: Map-Reduce Framework
13/03/30 00:41:02 INFO mapred.JobClient: Map input records=12
13/03/30 00:41:02 INFO mapred.JobClient: Physical memory (bytes) snapshot=80519168
13/03/30 00:41:02 INFO mapred.JobClient: Spilled Records=0
13/03/30 00:41:02 INFO mapred.JobClient: CPU time spent (ms)=2070
13/03/30 00:41:02 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
13/03/30 00:41:02 INFO mapred.JobClient: Virtual memory (bytes) snapshot=712142848
13/03/30 00:41:02 INFO mapred.JobClient: Map output records=12
13/03/30 00:41:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
13/03/30 00:41:02 INFO mapreduce.ImportJobBase: Transferred 701 bytes in 33.7146 seconds (20.7922 bytes/sec)
13/03/30 00:41:02 INFO mapreduce.ImportJobBase: Retrieved 12 records.
13/03/30 00:41:02 INFO manager.OracleManager: Time zone has been set to GMT
13/03/30 00:41:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM EMP t WHERE 1=0
13/03/30 00:41:02 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
13/03/30 00:41:02 WARN hive.TableDefWriter: Column MGR had to be cast to a less precise type in Hive
13/03/30 00:41:02 WARN hive.TableDefWriter: Column HIREDATE had to be cast to a less precise type in Hive
13/03/30 00:41:02 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
13/03/30 00:41:02 WARN hive.TableDefWriter: Column COMM had to be cast to a less precise type in Hive
13/03/30 00:41:02 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive
13/03/30 00:41:02 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/user/hadoop/EMP/_logs
13/03/30 00:41:02 INFO hive.HiveImport: Loading uploaded data into Hive
13/03/30 00:41:08 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201303300041_854717120.txt
13/03/30 00:41:19 INFO hive.HiveImport: OK
13/03/30 00:41:19 INFO hive.HiveImport: Time taken: 10.866 seconds
13/03/30 00:41:19 INFO hive.HiveImport: Loading data to table default.emp
13/03/30 00:41:20 INFO hive.HiveImport: OK
13/03/30 00:41:20 INFO hive.HiveImport: Time taken: 0.507 seconds
13/03/30 00:41:20 INFO hive.HiveImport: Hive import complete.
13/03/30 00:41:20 INFO hive.HiveImport: Export directory is empty, removing it. 复制代码
使用HIVE语句查看对应的数据如下:[hadoop@domain bin]$ hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201303300045_1940247672.txt
hive> show tables;
OK
emp
hive_test
invites
pokes
Time taken: 6.632 seconds
hive> select * from emp;
OK
7369.0 SMITH CLERK 7902.0 1980-12-17 00:00:00.0 800.0 NULL 20.0
7499.0 ALLEN SALESMAN 7698.0 1981-02-20 00:00:00.0 1600.0 300.0 30.0
7521.0 WARD SALESMAN 7698.0 1981-02-22 00:00:00.0 1250.0 500.0 30.0
7566.0 JONES MANAGER 7839.0 1981-04-02 00:00:00.0 2975.0 NULL 20.0
7654.0 MARTIN SALESMAN 7698.0 1981-09-28 00:00:00.0 1250.0 1400.0 30.0
7698.0 BLAKE MANAGER 7839.0 1981-05-01 00:00:00.0 2850.0 NULL 30.0
7782.0 CLARK MANAGER 7839.0 1981-06-09 00:00:00.0 2450.0 NULL 10.0
7839.0 KING PRESIDENT NULL 1981-11-17 00:00:00.0 5000.0 NULL 10.0
7844.0 TURNER SALESMAN 7698.0 1981-09-08 00:00:00.0 1500.0 0.0 30.0
7900.0 JAMES CLERK 7698.0 1981-12-03 00:00:00.0 950.0 NULL 30.0
7902.0 FORD ANALYST 7566.0 1981-12-03 00:00:00.0 3000.0 NULL 20.0
7934.0 MILLER CLERK 7782.0 1982-01-23 00:00:00.0 1300.0 NULL 10.0
Time taken: 0.777 seconds
hive> 复制代码
与Oracle中的原始数据一致 (空格对应HIVE为NULL)
1 7369 SMITH CLERK 7902 1980-12-17 星期三 800.00
20 2 7499 ALLEN SALESMAN 7698 1981-02-20 星期五 1600.00 300.00
30 3 7521 WARD SALESMAN 7698 1981-02-22 星期日 1250.00 500.00
30 4 7566 JONES MANAGER 7839 1981-04-02 星期四 2975.00
20 5 7654 MARTIN SALESMAN 7698 1981-09-28 星期一 1250.00 1400.00
30 6 7698 BLAKE MANAGER 7839 1981-05-01 星期五 2850.00
30 7 7782 CLARK MANAGER 7839 1981-06-09 星期二 2450.00
10 8 7839 KING PRESIDENT 1981-11-17 星期二 5000.00
10 9 7844 TURNER SALESMAN 7698 1981-09-08 星期二 1500.00 0.00
30 10 7900 JAMES CLERK 7698 1981-12-03 星期四 950.00
30 11 7902 FORD ANALYST 7566 1981-12-03 星期四 3000.00
20 12 7934 MILLER CLERK 7782 1982-01-23 星期六 1300.00 10
至此 sqoop导入Oracle数据到HIVE成功!!!
二、有些看到上面还是不明白,这里进一步明确一下
Shell代码sqoop export --table tableName -connect jdbc:oracle:thin:@xx.xx.xx.xx:port:xxx --username userNameA --password pwd --export-dir sourceFile --input-fields-terminated-by '\001' --input-lines-terminated-by '\n' 复制代码
常见错误总结:
sqoop导出数据到oracle:报错:
异常代码:java.lang.IllegalArgumentException: Attempted to generate class with no columns!
at org.apache.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:1295)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1176)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
*****************
2013-08-29 12:18:17,217 INFO org.apache.sqoop.manager.SqlManager: Executing SQL statement: SELECT t.* FROM TMP t WHERE 1=0
2013-08-29 12:18:17,235 ERROR org.apache.sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: Attempted to generate class with no columns! 复制代码
查看Sqoop源码发现这么一段:
此异常是在没有找到目标表的column时抛出:
Java代码private StringBuilder generateClassForColumns(
Map<String, Integer> columnTypes,
String [] colNames, String [] dbWriteColNames) {
if (colNames.length ==0) {
throw new IllegalArgumentException("Attempted to generate class with "
+ "no columns!");
} 复制代码
再查导致获取columns为空的原因,发现columns的获取是通过如下代码:
Java代码String [] colNames = options.getColumns();
if (null == colNames) {
if (null != tableName) {
// Table-based import. Read column names from table.
colNames = connManager.getColumnNames(tableName);
} else if (options.getCall() != null) {
// Read procedure arguments from metadata
colNames = connManager.getColumnNamesForProcedure(
this.options.getCall());
} else {
// Infer/assign column names for arbitrary query.
colNames = connManager.getColumnNamesForQuery(
this.options.getSqlQuery());
}
} 复制代码
可以发现,它是先找参数columns的值,如果没有再去通过table去查。
通过table查寻是通过该table的owner去查:
见org.apache.sqoop.manager.OracleManager.java
中getColumnNames()方法:
Java代码conn = getConnection();
pStmt = conn.prepareStatement(QUERY_COLUMNS_FOR_TABLE,
ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
pStmt.setString(1, tableOwner);
pStmt.setString(2, shortTableName);
rset = pStmt.executeQuery(); 复制代码
其主要查询语句为:
Java代码public static final String QUERY_COLUMNS_FOR_TABLE =
"SELECT COLUMN_NAME FROM ALL_TAB_COLUMNS WHERE "
+ "OWNER = ? AND TABLE_NAME = ? ORDER BY COLUMN_ID"; 复制代码
在sqoop语句中并没有指定columns的值,它会理所当然的去查table的columns。
而我的table是同义词,其属主用户并非userNameA。
这下解决方式很明显了,两种:
1.将user替换成table的属主用户。
2.添加--columns属性,显示的指定columns的值。