由于元数据库使用的是postgresql, 修改数据库编码并不能像mysql那样方便,(暂时没找到方便的方式处理)
所以直接用修改源代码的方式 处理
官网上找到相关修改的bug
https://issues.apache.org/jira/browse/HIVE-5682
修改类
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
下面这个方法替换为
private static void displayAllParameters(Map<String, String> params, StringBuilder tableInfo) {
List<String> keys = new ArrayList<String>(params.keySet());
String value = null;
Collections.sort(keys);
for (String key : keys) {
tableInfo.append(FIELD_DELIM); // Ensures all params are indented.
value = params.get(key);
if("comment".equals(key)&& null!=value && value.getBytes().length!=key.length())
{
formatOutput(key, value, tableInfo);
}
else
{
formatOutput(key, StringEscapeUtils.escapeJava(value), tableInfo);
}
}
}
####重新编译hive-exec-0.13.1-cdh5.3.1.jar,再放到 /opt/cloudera/parcels/CDH/jars 目录下 替换
####show create table 有中文乱码问题,解决
可参考
https://issues.apache.org/jira/browse/HIVE-2905
https://issues.apache.org/jira/secure/attachment/12791019/HIVE-11837.1.patch
修改源代码
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
@@ -2048,7 +2048,7 @@ private int showCreateTable(Hive db, DataOutputStream outStream, String tableNam
if (tbl.isView()) {
String createTab_stmt = "CREATE VIEW `" + tableName + "` AS " + tbl.getViewExpandedText();
- outStream.writeBytes(createTab_stmt.toString());
+ outStream.write(createTab_stmt.toString().getBytes("UTF-8"));
return 0;
}
@@ -2196,7 +2196,7 @@ else if (sortCol.getOrder() == BaseSemanticAnalyzer.HIVE_COLUMN_ORDER_DESC) {
}
createTab_stmt.add(TBL_PROPERTIES, tbl_properties);
- outStream.writeBytes(createTab_stmt.render());
+ outStream.write(createTab_stmt.render().getBytes("UTF-8"));
} catch (IOException e) {
LOG.info("show create table: " + stringifyException(e));
return 1;
#####重新编译hive-exec-0.13.1-cdh5.3.1.jar,再放到 /opt/cloudera/parcels/CDH/jars 目录下 替换
下载hive源代码可以到这里找到对应版本 http://archive.cloudera.com/cdh5/cdh/5/
cd /Users/yzygenuine/Downloads/hive-0.13.1-cdh5.3.1
##执行如下编译打包项目
mvn clean package -Phadoop-2 -Pdist -DskipTests -Dtar
同样了,修改好代码再编译打包出新了jar包,去替换线上的
#####替换后效果如下:
hive> show create table dwd_audio_download_redis;
OK
CREATE TABLE `dwd_audio_download_redis`(
`audio_id` bigint COMMENT '节目ID',
`download_cnt` bigint COMMENT '下载量')
COMMENT '节目从上传到分区时间的下载量'
PARTITIONED BY (
`day` bigint COMMENT '节目某天统计数据',
`hour` bigint COMMENT '节目某时统计数据')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://master:8020/user/hive/warehouse/hive_db.db/dwd_audio_download_redis'
TBLPROPERTIES (
'transient_lastDdlTime'='1457938879')
Time taken: 0.55 seconds, Fetched: 17 row(s)
#########desc formatted 语句,取出表注释有中文乱码总理 ,解决
hive> desc formatted dwd_audio_download_redis;
OK
# col_name data_type comment
audio_id bigint 节目ID
download_cnt bigint 下载量
# Partition Information
# col_name data_type comment
day bigint 节目某天统计数据
hour bigint 节目某时统计数据
# Detailed Table Information
Database: hive_db
Owner: datamining
CreateTime: Mon Mar 14 15:01:19 CST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://master:8020/user/hive/warehouse/hive_db.db/dwd_audio_download_redis
Table Type: MANAGED_TABLE
Table Parameters:
comment 节目从上传到分区时间的下载量
transient_lastDdlTime 1457938879
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.514 seconds, Fetched: 34 row(s)