sqoop 数据转移问题 from mysql to hive
我在使用sqoop传输数据库是这么做的:1, 用hive在集群上先创建一个新的database(数据传输目的地):hivewattest02, 查阅sqoop的几个简单使用方法后,写了个脚本如下,但是执行速度很慢
#! /bin/sh
source /etc/profile
source /etc/bashrc
CONNECTURL=192.168.1.140
PORTNUM=3306
DBNAME=wattest0
USERNAME=root
PASSWORD=1
HIVEDB=hivewattest0
echo `sqoop list-tables -connect jdbc:mysql://${CONNECTURL}:${PORTNUM}/${DBNAME} -username ${USERNAME} -password ${PASSWORD}` > tmptable.log
flag=0
for line in `cat tmptable.log`
do
if [[ "${line}" == "analysistable" ]]
then
flag=1
fi
if [[ "${flag}" == "1" ]]
then
echo `sqoop import -connect jdbc:mysql://${CONNECTURL}:${PORTNUM}/${DBNAME} -username ${USERNAME} -password ${PASSWORD} -table ${line} -hive-import -hive-table ${HIVEDB}.${line}`
fi
done
3,运行的部分log如下:其实就是执行下面这个命令的,它耗时很多
`sqoop import -connect jdbc:mysql://${CONNECTURL}:${PORTNUM}/${DBNAME} -username ${USERNAME} -password ${PASSWORD} -table ${line} -hive-import -hive-table ${HIVEDB}.${line}`
Loading data to table hivewattest0.processsteps//hive 中新建的DB和该DB中的某个table
chgrp: changing ownership of 'hdfs://master:8020/user/hive/warehouse/hivewattest0.db/processsteps/part-m-00000': User does not belong to hive
Table hivewattest0.processsteps stats:
OK
Time taken: 0.652 seconds
Warning: /opt/cloudera/parcels/CDH-5.9.2-1.cdh5.9.2.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/05/17 15:32:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.9.2
17/05/17 15:32:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/05/17 15:32:57 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/05/17 15:32:57 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/05/17 15:32:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/05/17 15:32:58 INFO tool.CodeGenTool: Beginning code generation
17/05/17 15:32:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `productsettings` AS t LIMIT 1
17/05/17 15:32:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `productsettings` AS t LIMIT 1
17/05/17 15:32:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/ae110202d24cd7c86be1f098c7352529/productsettings.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/05/17 15:33:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ae110202d24cd7c86be1f098c7352529/productsettings.jar
17/05/17 15:33:00 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/05/17 15:33:00 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/05/17 15:33:00 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/05/17 15:33:00 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/05/17 15:33:00 INFO mapreduce.ImportJobBase: Beginning import of productsettings
17/05/17 15:33:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/05/17 15:33:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/05/17 15:33:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.200.101:8032
17/05/17 15:33:05 INFO db.DBInputFormat: Using read commited transaction isolation
17/05/17 15:33:05 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`ID`), MAX(`ID`) FROM `productsettings`
17/05/17 15:33:05 INFO db.IntegerSplitter: Split size: 16; Num splits: 4 from: 1 to: 65
17/05/17 15:33:05 INFO mapreduce.JobSubmitter: number of splits:4
17/05/17 15:33:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494812274304_0083
17/05/17 15:33:06 INFO impl.YarnClientImpl: Submitted application application_1494812274304_0083
17/05/17 15:33:06 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1494812274304_0083/
17/05/17 15:33:06 INFO mapreduce.Job: Running job: job_1494812274304_0083
17/05/17 15:33:14 INFO mapreduce.Job: Job job_1494812274304_0083 running in uber mode : false
17/05/17 15:33:14 INFO mapreduce.Job:map 0% reduce 0%
17/05/17 15:33:22 INFO mapreduce.Job:map 25% reduce 0%
17/05/17 15:33:23 INFO mapreduce.Job:map 50% reduce 0%
17/05/17 15:33:27 INFO mapreduce.Job:map 75% reduce 0%
17/05/17 15:33:29 INFO mapreduce.Job:map 100% reduce 0%
17/05/17 15:33:29 INFO mapreduce.Job: Job job_1494812274304_0083 completed successfully
17/05/17 15:33:29 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=590064
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=400
HDFS: Number of bytes written=1214
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=19874
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=19874
Total vcore-seconds taken by all map tasks=19874
Total megabyte-seconds taken by all map tasks=20350976
Map-Reduce Framework
Map input records=45
Map output records=45
Input split bytes=400
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=294
CPU time spent (ms)=6530
Physical memory (bytes) snapshot=788549632
Virtual memory (bytes) snapshot=11102887936
Total committed heap usage (bytes)=610271232
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1214
17/05/17 15:33:29 INFO mapreduce.ImportJobBase: Transferred 1.1855 KB in 28.0971 seconds (43.2074 bytes/sec)
17/05/17 15:33:29 INFO mapreduce.ImportJobBase: Retrieved 45 records.
17/05/17 15:33:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `productsettings` AS t LIMIT 1
17/05/17 15:33:29 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.9.2-1.cdh5.9.2.p0.3/jars/hive-common-1.1.0-cdh5.9.2.jar!/hive-log4j.properties
OK
Time taken: 3.301 seconds
综上:mysql中旧的DB大概有70个table。
整个过程耗时差不多二十分钟,请问有没有别的方法,能够快速将mysql中DB导入hive中的DB。
多开几个线程试试sqoop import -connect jdbc:mysql://${CONNECTURL}:${PORTNUM}/${DBNAME} -username ${USERNAME} -password ${PASSWORD} -table ${line} -hive-import -m 10 -hive-table ${HIVEDB}.${line}
jixianqiuxue 发表于 2017-5-17 19:45
多开几个线程试试sqoop import -connect jdbc:mysql://${CONNECTURL}:${PORTNUM}/${DBNAME} -username ${US ...
你好,谢谢你,我等下试试多加几个线程,另外发现在跑的时候集群master的内存消耗很大。但其他主机消耗很小。 youngwenhao 发表于 2017-5-18 09:36
你好,谢谢你,我等下试试多加几个线程,另外发现在跑的时候集群master的内存消耗很大。但其他主机消耗很 ...
master角色比较多。可能安装的比较多。
页:
[1]