假定: 用户database:InsideTSQL2008, tablename:Employees
注意:如果需要导入的table “No primary key could be found”, 则需要“specify one with --split-by or perform a quential import with '-m 1'.”
准备:从Microsoft官网下载jdbc驱动, 将sqljdbc_3.0/enu/下的sqljdbc4.jar和sqljdbc.jar包导入到$SQOOP_HOME/lib下。
下载地址:http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=2505
1. import all-tables
${Installation}/bin/sqoop import-all-tables\
--connect 'jdbc:sqlserver://172.16.102.30; username=sa;\ password=1qaz!QAZ; database=InsideTSQL2008' -m 1 \
--target-dir /user/grid/mydb/
2. import some tables
${Installation}/bin/sqoop import --verbose \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver\
--connect 'jdbc:sqlserver://172.16.102.30; username=sa;\
password=1qaz!QAZ; database=InsideTSQL2008' \
--table Employees --target-dir /user/grid/Employees_db -m 1
3. list-tables
$SQOOP_HOME/bin/sqoop list-tables \
--connect 'jdbc:sqlserver://172.16.102.30; username=sa; \
password=1qaz!QAZ; database=InsideTSQL2008'
4. create-hive-table
sqoop import --warehouse-dir "/user/hive/MyWareHouse" \
--hive-overwrite --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa;password=1qaz!QAZ; database=InsideTSQL2008' \
--table Employees --hive-table tblFoo --hive-import -m 1
注意:导入hive以后因为Hive官方默认的配置问题,是看不到已导入的tables的,需要做如下修改:
$HIVE_HOME/conf/hive-site.xml中修改配置如下:
修改一
将:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
修改为:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/grid/hive/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
注意:databaseName的地址为metastore_db存放的地方,默认在$HIVE_HOME/下面。
修改二
官方配置文档有错误,做如下修改:
将:
<property>
<name>hive.server2.thrift.sasl.qop</name>
<value>auth</auth>
跟正为:
<property>
<name>hive.server2.thrift.sasl.qop</name>
<value>auth</value>
修改以上两项后重启hive 即可。
5. list-databases
sqoop list-databases --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'
6. list-columns-of-a-table
sqoop import -connect 'jdbc:sqlserver://172.16.102.30; \
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\
--table Employees --columns "empid, lastname, firstname, title"\
-m 1
7. Storing data in SequenceFiles, and setting the generated class name to com.foo.Employee
sqoop import -connect 'jdbc:sqlserver://172.16.102.30; \
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\
--table Employees --class-name com.foo.Employee \
--as-sequencefile -m 1
8.Specifying the delimiters to use in a text-mode import
sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \
--table Employees --fields-terminated-by ',' \
--lines-terminated-by '\n' --optionally-enclosed-by '\"' -m
9. import data like "select ... where..." and with user-defined-delimiters
sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \
--table Employees --where "mgrid = 5" --fields-terminated-by ','\
--lines-terminated-by '\n' --optionally-enclosed-by '\"' -m 1
10. Changing the splitting column from the default(default is the primary key)
sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\
--table Employees --where "mgrid = 5" --split-by empi\
--fields-terminated-by ',' --lines-terminated-by '\n' \
--optionally-enclosed-by '\"'
11. append data to an existed table
qoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \
--table Employees --where "mgrid <> 5" --fields-terminated-by ','\
--lines-terminated-by '\n' --optionally-enclosed-by '\"' \
--split-by empid --append
12 uses validation to validate the import using the table row count and number of rows copied into HDFS
sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\
--table Employees --validate -m 1
13. Query Imports
sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\
--query "SELECT a.empid, a.lastname FROM Employees AS a WHERE\
a.empid > 2 AND \$CONDITIONS" -split-by a.empid \
--target-dir /user/grid/query_results
注意,--target-dir,-split-by(或m 1)不能少,$CONDITIONS必须有,且为保险起见加反斜杠;
程序执行时会将以上命令展开为:
SELECT a.empid, a.lastname FROM Employees AS a WHERE a.empid > 2 AND (1 = 0)
14. delete-target-dir
sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008'\
--table Employees -m 1 --delete-target-dir
Delete the import target directory if it exists.
注意,不需要指定要删除哪个directory, “delete-target-dir”删除的就是需要import数据的那个directory
15. codegen
sqoop codegen --connect 'jdbc:sqlserver://172.16.102.30;\
username=sa; password=1qaz!QAZ; database=InsideTSQL2008' \
--table Employees
注意 codegen的argument, 不含delete-target-dir, 请参考codegen的common argument, codegen arguments, Output line formatting arguments, Input parsing arguments和Hive arguments
16. import tables with sqoop class in programs
code from : http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.sqoop/sqoop/1.3.0-cdh3u1/com/cloudera/sqoop/manager/SQLServerManager.java
package com.cloudera.sqoop.manager;
import java.io.IOException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.mapreduce.JdbcExportJob;
import com.cloudera.sqoop.mapreduce.SQLServerExportOutputFormat;
import com.cloudera.sqoop.util.ExportException;
//Manages connections to SQLServer databases. Requires the SQLServer JDBC driver.
public class [More ...] SQLServerManager extends GenericJdbcManager {
public static final Log LOG = LogFactory.getLog(
SQLServerManager.class.getName());
// driver class to ensure is loaded when making db connection.
private static final String DRIVER_CLASS =
"com.microsoft.sqlserver.jdbc.SQLServerDriver";
public [More ...] SQLServerManager(final SqoopOptions opts) {
super(DRIVER_CLASS, opts);
}
// Export data stored in HDFS into a table in a database.
@Override
public void [More ...] exportTable(ExportJobContext context)
throws IOException, ExportException {
context.setConnManager(this);
JdbcExportJob exportJob = new JdbcExportJob(context, null, null,
SQLServerExportOutputFormat.class);
exportJob.runExport();
}
// SQLServer does not support the CURRENT_TIMESTAMP() function. Instead it has the notion of keyword CURRENT_TIMESTAMP that resolves to the current time stamp for the database system.
@Override
public String [More ...] getCurTimestampQuery() {
return "SELECT CURRENT_TIMESTAMP";
}
}