1.环境
java version "1.8.0_65"
hadoop 2.7.2
hive 2.0
sqoop 1.4.6
2.所需软件包
①.sqljdbc_3.0.1301.101_chs.tar //下载地址:http://www.microsoft.com/en-us/d ... ang=en&id=21599
②.sqoop-sqlserver-1.0.tar //改软件包微软已经不提供下载,CSDN有收费
3.软件安装
①.$tar -xvf ljdbc_3.0.1301.101_chs.tar 将解压后拷贝sqljdbc4.jar到${SQOOP_HOME}/lib目录下
②.解压qoop-sqlserver-1.0.tar ,配置环境变量MSSQL_CONNECTOR_HOME="/soft/sqoop-sqlserver-1.0",进入,执行 sh install.sh,下述为成功例子:
ubuntu@namenode-01:/soft/sqoop-sqlserver-1.0$ sh install.sh
Starting 'Microsoft SQL Server - Hadoop' Connector installation ...
SQOOP_HOME set to: /soft/sqoop
SQOOP_CONF_DIR set to: /soft/sqoop/conf
Installing 'SQL Server - Hadoop' Connector library ...
Installed library
Installing 'SQL Server - Hadoop' Connector configuration ...
Installed configuration
'SQL Server - Hadoop' Connector Installation completed successfully.
4.这时准备工作已经完成,已经可以从SQL server将数据load过来,需要注意的是 --connect ''
①.单个表import
sqoop import \
--connect 'jdbc:sqlserver://192.168.88.125; username=sa;password=123;database=testgold' \
--table da_jxc_daysum \
--fields-terminated-by "," \
--lines-terminated-by "\n" \
--target-dir /user/ubuntu/sql/da_jxc_daysum \
-m 1
②.将SQL server表导入到hive里
sqoop import \
--connect 'jdbc:sqlserver://192.168.88.125; username=sa;password=123;database=testgold' \
--table da_jxc_daysum \
--target-dir /user/sql/da_jxc_daysum \
-m 1 \
--fields-terminated-by "," \
--lines-terminated-by "\n" \
--hive-import \
--hive-overwrite \
--create-hive-table \
--hive-table da_jxc_daysum \
--delete-target-dir
5.增量导入
①.--check-column (col) 用来作为判断的列名,如id
②.--incremental (mode) append:追加,比如对大于last-value指定的值之后的记录进行追加导入。lastmodified:最后的修改时间,追加last-value指定的日期之后的记录
③.--last-value (value) 指定自从上次导入后列的最大值(大于该指定的值),也可以自己设定某一值
以下是个以时间为增量的例子:
sqoop job --create sqoop_jop1 -- import --connect 'jdbc:sqlserver://192.168.88.125; username=sa;password=123;database=testgold' --table da_jxc_daysum --target-dir /user/hive/warehouse/my01 --append --check-column "modify_date" --incremental lastmodified --last-value "2016-07-19 00:00:00.001" -m 1 |