最近工作涉及了从hive表---》drds的数据迁移,近期做了一些简单的测试:
建表语句如下:
hive单表:
CREATE TABLE item_zc_test1(
num_id string,
dp_id string,
approve_status string,
title string,
price string,
nick string,
cid string,
pic_url string,
props string,
list_time string,
modified string,
delist_time string,
date_id string,
day_id string
)
row format delimited fields terminated by '\001'
stored as textfile;
hive分区表(二级分区):
CREATE TABLE item_zc_test_partition(
num_id string,
dp_id string,
approve_status string,
title string,
price string,
nick string,
cid string,
pic_url string,
props string,
list_time string,
modified string,
delist_time string
) partitioned by (date_id string,day_id string)
row format delimited fields terminated by '\001'
stored as textfile;
drds建表语句:
1)单表:
create table item_zc_test1 (
num_id varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(400),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100),
date_id varchar(50),
day_id varchar(50),
primary key(num_id)
);
2)分库不分表
create table item_zc_test (
num_id varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(400),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100),
date_id varchar(50),
day_id varchar(50),
primary key(num_id)
) dbpartition by hash(num_id);
3)分库分表
create table item_zc_test2 (
num_id varchar(100),
dp_id varchar(100),
approve_status varchar(100),
title varchar(100),
price varchar(100),
nick varchar(100),
cid varchar(100),
pic_url varchar(200),
props varchar(400),
list_time varchar(100),
modified varchar(100),
delist_time varchar(100),
date_id varchar(50),
day_id varchar(50),
primary key(num_id)
) dbpartition by hash(num_id) tbpartition by hash(dp_id) tbpartitions 3;
1、hive单表---》drds单表
sqoop脚本:
sqoop export \
--connect jdbc:mysql://xxxx/xx \
--username xxxx \
--password xx \
--table item_zc_test1 \
--export-dir /user/hive/warehouse/item_zc_test1\
--input-fields-terminated-by '\001' \
--num-mappers 1
验证结果:通过sqoop可以正常的将hdfs文件中的数据导入到drds中。
2、hive单表---》drds分库不分表
sqoop脚本:
sqoop export \
--connect jdbc:mysql://xxxx/xx \
--username xxxx \
--password xx \
--table item_zc_test \
--export-dir /user/hive/warehouse/item_zc_test1 \
--input-fields-terminated-by '\001' \
--num-mappers 1
验证结果:通过sqoop可以正常的将hdfs文件中的数据导入到drds中。
3、hive单表---》drds分库分表
注意:如果drds是分库分表情况下,如果在按照上述脚本进行数据迁移,脚本执行会报错,对脚本修改如下(指定相应的字段):
sqoop export \
--connect jdbc:mysql://xxxx/xx \
--username xxxx \
--password xx \
--table item_zc_test2 \
--columns "num_id,dp_id,approve_status,title,price,nick,cid,pic_url,props,list_time,modified,delist_time,date_id,day_id" \
--export-dir /user/hive/warehouse/item_zc_test1 \
--input-fields-terminated-by '\001' \
--num-mappers 5
4、下面介绍hive分区表导出数据到drds
默认sqoop官方参数说明:导出hive分区表数据只支持hive单表和hive一级分区表(脚本如下):
sqoop export \
--connect jdbc:mysql://xxxx/xx \
--username xxxx \
--password xx \
--table table \
--hive-partition-key pk \ // 指定一级分区字段
--hive-partition-value ‘pv’ \ // 指定分区值
--export-dir /user/hive/warehouse/table/pk=pv/ \
--input-fields-terminated-by '\001' \
--input-lines-terminated-by '\n'
hive多级分区表sqoop导出方案,建议以创建中间临时表的方式进行数据导出。
参考链接:https://stackoverflow.com/questions/33100799/sqoop-export-the-partitioned-hive-table