分享

【中文乱码】使用Sqoop 将Mysql数据导入到Hive中

zxmit 发表于 2015-9-6 20:34:50 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 6 19226
Mysql表属性:
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field     | Type          |  Collation        | Null | Key | Default | Extra | Privileges                      | Comment |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| id         | int(11)          | NULL                 | YES  |     | NULL    |       | select,insert,update,references |         |
| content| varchar(100) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
| time     | varchar(100) | utf8_general_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+

Mysql表数据:
mysql> select * from t1;
+------+-----------------------+---------------------+
| id   | content               | time                |
+------+-----------------------+---------------------+
|    0 | 中华人民共和国 | 2015-01-09 06:00:00 |
|    1 | 中华人民共和国 | 2015-01-09 06:00:00 |
|    2 | 中华人民共和国 | 2015-01-09 06:00:00 |
|    3 | 中华人民共和国 | 2015-01-09 06:00:00 |
|    4 | 中国                | 2015-01-09 06:00:00 |

+------+-----------------------+---------------------+

数据获取执行命令:
[xiaoming@m103 ~]$ sqoop import --connect "jdbc:mysql://m103/test?useUnicode=true&characterEncoding=utf-8" --table t1 --username hive --password 111111  -m 1 --hive-import

从hive中查看导入的数据:
hive> select * from t1;
OK
0        ä¸-åŽäoo民共和国        2015-01-09 06:00:00
1        ä¸-åŽäoo民共和国        2015-01-09 06:00:00
2        ä¸-åŽäoo民共和国        2015-01-09 06:00:00
3        ä¸-åŽäoo民共和国        2015-01-09 06:00:00
4        ä¸-国        2015-01-09 06:00:00

Time taken: 21.283 seconds, Fetched: 5 row(s)


已有(6)人评论

跳转到指定楼层
zxmit 发表于 2015-9-6 21:12:12
Log日志:
[xiaoming@m103 ~]$ sqoop import --connect "jdbc:mysql://m103/test?useUnicode=true&characterEncoding=utf-8" --table t1 --username hive --password 111111  -m 1 --hive-import
Warning: /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/09/06 21:02:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.2
15/09/06 21:02:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/09/06 21:02:11 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/09/06 21:02:11 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/09/06 21:02:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/09/06 21:02:11 INFO tool.CodeGenTool: Beginning code generation
15/09/06 21:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
15/09/06 21:02:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
15/09/06 21:02:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
注: /tmp/sqoop-xiaoming/compile/515a5ecc2e86d05a793c27f7d716b91b/t1.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
15/09/06 21:02:14 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-xiaoming/compile/515a5ecc2e86d05a793c27f7d716b91b/t1.jar
15/09/06 21:02:14 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/09/06 21:02:14 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/09/06 21:02:14 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/09/06 21:02:14 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/09/06 21:02:14 INFO mapreduce.ImportJobBase: Beginning import of t1
15/09/06 21:02:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/09/06 21:02:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/09/06 21:02:15 INFO client.RMProxy: Connecting to ResourceManager at m101/172.16.8.101:8032
15/09/06 21:02:19 INFO db.DBInputFormat: Using read commited transaction isolation
15/09/06 21:02:19 INFO mapreduce.JobSubmitter: number of splits:1
15/09/06 21:02:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441541663800_0011
15/09/06 21:02:20 INFO impl.YarnClientImpl: Submitted application application_1441541663800_0011
15/09/06 21:02:20 INFO mapreduce.Job: The url to track the job: http://m101:8088/proxy/application_1441541663800_0011/
15/09/06 21:02:20 INFO mapreduce.Job: Running job: job_1441541663800_0011
15/09/06 21:02:28 INFO mapreduce.Job: Job job_1441541663800_0011 running in uber mode : false
15/09/06 21:02:28 INFO mapreduce.Job:  map 0% reduce 0%
15/09/06 21:02:36 INFO mapreduce.Job:  map 100% reduce 0%
15/09/06 21:02:37 INFO mapreduce.Job: Job job_1441541663800_0011 completed successfully
15/09/06 21:02:37 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=137526
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=108
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=5817
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=5817
                Total vcore-seconds taken by all map tasks=5817
                Total megabyte-seconds taken by all map tasks=5956608
        Map-Reduce Framework
                Map input records=5
                Map output records=5
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=21
                CPU time spent (ms)=1930
                Physical memory (bytes) snapshot=343474176
                Virtual memory (bytes) snapshot=2309070848
                Total committed heap usage (bytes)=792199168
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=108
15/09/06 21:02:37 INFO mapreduce.ImportJobBase: Transferred 108 bytes in 21.6966 seconds (4.9777 bytes/sec)
15/09/06 21:02:37 INFO mapreduce.ImportJobBase: Retrieved 5 records.
15/09/06 21:02:37 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t1` AS t LIMIT 1
15/09/06 21:02:37 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/jars/hive-common-1.1.0-cdh5.4.2.jar!/hive-log4j.properties
OK
Time taken: 2.021 seconds
Loading data to table default.t1
Table default.t1 stats: [numFiles=2, totalSize=216]
OK
Time taken: 0.845 seconds
回复

使用道具 举报

arsenduan 发表于 2015-9-6 21:13:46
sqoop import --connect "jdbc:mysql://m103/test?useUnicode=true&characterEncoding=utf-8" --table t1 --username hive --password 111111  -m 1 --hive-import

命令强制转码了
是使用的什么工具连接的。
如果使用xshell,改下xshell编码
如果SecureCRT, 去 terminal-->appearance--->character encoding 设置

总之使用的什么,能够解析你的utf-8,就能正常显示了
回复

使用道具 举报

zxmit 发表于 2015-9-6 23:18:24
arsenduan 发表于 2015-9-6 21:13
sqoop import --connect "jdbc:mysql://m103/test?useUnicode=true&characterEncoding=utf-8" --table t1  ...

应该并不是这方面的问题,因为现在xshell已经是utf-8显示了。
回复

使用道具 举报

zxmit 发表于 2015-9-7 11:30:54
我查了一下mysql字符集:
mysql> show variables like  "character%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client          | latin1                       |
| character_set_connection | latin1                       |
| character_set_database    | utf8                       |
| character_set_filesystem   | binary                     |
| character_set_results        | latin1                       |
| character_set_server         | utf8                       |
| character_set_system        | utf8                       |
| character_sets_dir             | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

现在判断是字符集的问题,不过还没有找到解决方案
回复

使用道具 举报

credit 发表于 2015-12-1 10:02:55
zxmit 发表于 2015-9-7 11:30
我查了一下mysql字符集:
mysql> show variables like  "character%";
+--------------------------+---- ...

您的问题解决了没,我现在遇到oracle字符编码问题乱码,
回复

使用道具 举报

诗景尘 发表于 2015-12-6 13:00:18
直接使用sqoop import --connect "jdbc:mysql://m103/test --table t1 --username hive --password 111111  -m 1 --hive-import试试,命令中不要强制使用utf-8
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条