分享

hive外部分区表分区4w,全表扫优化方式.

为梦狂野 发表于 2017-10-10 10:44:10 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 4 6604
本帖最后由 为梦狂野 于 2017-10-10 11:14 编辑

个人发现hive 注册表有4w多个分区,当我执行按照平台分组统计数量这个过程非常缓慢,是否有优化之法?谢谢。
[mw_shl_code=bash,true]set mapreduce.job.queuename=root.stats;select platform,$data_date as itime,count(1) as cnt from edw.register_user_log where business in(4,5) and substr(data_hour,1,8)<=$data_date group by platform,$data_date;

[push@***************~]$ hive -e "show partitions edw.register_user_log" > register_parts.txt
which: no hbase in (/home/push/anaconda2/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/push/bin:/home/push/.local/bin:/home/push/bin)

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 4.507 seconds, Fetched: 44571 row(s)
[/mw_shl_code]

本帖被以下淘专辑推荐:

已有(4)人评论

跳转到指定楼层
qcbb001 发表于 2017-10-10 10:55:00
4.507秒还慢,楼主想要多快
回复

使用道具 举报

为梦狂野 发表于 2017-10-10 11:04:53
本帖最后由 为梦狂野 于 2017-10-10 11:07 编辑

重申下,是为了在这4w个分区中做相应的其他查询优化。
回复

使用道具 举报

为梦狂野 发表于 2017-10-10 11:05:30
qcbb001 发表于 2017-10-10 10:55
4.507秒还慢,楼主想要多快

set mapreduce.job.queuename=root.etl;select platform,count(1) as cnt from basic.user_info_external where business in(4,5) and data_date<=20171008 group by platform;
回复

使用道具 举报

为梦狂野 发表于 2017-10-10 11:13:17
qcbb001 发表于 2017-10-10 10:55
4.507秒还慢,楼主想要多快

感觉现在能够接受了,时间是我可以接受的范围,不用优化了,我直接已经满足了业务需求。暂时不查全表数据。
[mw_shl_code=bash,true][push@*********** ~]$ hive
which: no hbase in (/home/push/anaconda2/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/push/bin:/home/push/.local/bin:/home/push/bin)

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> set mapreduce.job.queuename=root.stats;select platform,20171008 as itime,count(1) as cnt from edw.register_user_log where business in(4,5) and substr(data_hour,1,8)>=20161101 and substr(data_hour,1,8)<=20171008 group by platform,20171008;
Query ID = push_20171010110202_2f792619-65fe-48ea-8471-c1d93dd6f813
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1099
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1505909724027_363932, Tracking URL = ************:8088/proxy/application_1505909724027_363932/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1505909724027_363932
Hadoop job information for Stage-1: number of mappers: 1316; number of reducers: 1099
2017-10-10 11:06:05,669 Stage-1 map = 0%,  reduce = 0%
2017-10-10 11:06:45,355 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 520.41 sec
2017-10-10 11:06:47,558 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 1172.21 sec
2017-10-10 11:06:48,674 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 1373.9 sec
2017-10-10 11:06:49,758 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1624.67 sec
2017-10-10 11:06:50,825 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 1788.2 sec
2017-10-10 11:06:51,909 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 2001.25 sec
2017-10-10 11:06:52,983 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 2278.77 sec
2017-10-10 11:06:54,095 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 2535.48 sec
2017-10-10 11:06:55,170 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 2762.29 sec
2017-10-10 11:06:56,299 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 3098.0 sec
2017-10-10 11:06:57,390 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 3530.53 sec
2017-10-10 11:06:58,475 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 3878.4 sec
2017-10-10 11:06:59,589 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 4631.95 sec
2017-10-10 11:07:00,658 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 5273.0 sec
2017-10-10 11:07:01,732 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 5755.19 sec
2017-10-10 11:07:02,871 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 6589.31 sec
2017-10-10 11:07:04,063 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 7496.76 sec
2017-10-10 11:07:05,131 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 8459.95 sec
2017-10-10 11:07:06,256 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 9381.74 sec
2017-10-10 11:07:07,332 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 10340.58 sec
2017-10-10 11:07:08,403 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 11176.8 sec
2017-10-10 11:07:09,541 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 11926.33 sec
2017-10-10 11:07:10,640 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 12680.73 sec
2017-10-10 11:07:11,720 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 13293.02 sec
2017-10-10 11:07:12,827 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 13836.86 sec
2017-10-10 11:07:13,925 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 14277.07 sec
2017-10-10 11:07:15,047 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 14730.54 sec
2017-10-10 11:07:16,170 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 14976.39 sec
2017-10-10 11:07:17,265 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 15342.09 sec
2017-10-10 11:07:18,359 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 15675.41 sec
2017-10-10 11:07:19,465 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 16192.14 sec
2017-10-10 11:07:20,549 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 16686.98 sec
2017-10-10 11:07:21,632 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 17105.59 sec
2017-10-10 11:07:22,739 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 17741.85 sec
2017-10-10 11:07:23,854 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 18191.51 sec
2017-10-10 11:07:24,974 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 18710.17 sec
2017-10-10 11:07:26,056 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 19212.03 sec
2017-10-10 11:07:27,174 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 19678.26 sec
2017-10-10 11:07:28,266 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 20382.1 sec
2017-10-10 11:07:29,401 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 20954.93 sec
2017-10-10 11:07:30,516 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 21694.13 sec
2017-10-10 11:07:31,626 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 22564.98 sec
2017-10-10 11:07:32,748 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 23426.33 sec
2017-10-10 11:07:33,867 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 24063.46 sec
2017-10-10 11:07:34,952 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 24638.08 sec
2017-10-10 11:07:36,056 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 25306.76 sec
2017-10-10 11:07:37,136 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 25849.07 sec
2017-10-10 11:07:38,240 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 26519.86 sec
2017-10-10 11:07:39,352 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 26893.94 sec
2017-10-10 11:07:40,426 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 27404.72 sec
2017-10-10 11:07:41,519 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 27896.97 sec
2017-10-10 11:07:42,641 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 28229.27 sec
2017-10-10 11:07:43,748 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 28879.18 sec
2017-10-10 11:07:44,842 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 29276.56 sec
2017-10-10 11:07:45,944 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 29566.32 sec
2017-10-10 11:07:47,022 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 30041.77 sec
2017-10-10 11:07:48,111 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 30384.57 sec
2017-10-10 11:07:49,220 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 30613.09 sec
2017-10-10 11:07:50,332 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 30831.22 sec
2017-10-10 11:07:51,440 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 30929.59 sec
2017-10-10 11:07:52,581 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 31021.75 sec
2017-10-10 11:07:54,791 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 31176.06 sec
2017-10-10 11:07:56,975 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 31279.11 sec
2017-10-10 11:08:16,788 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 31452.94 sec
2017-10-10 11:08:25,651 Stage-1 map = 100%,  reduce = 1%, Cumulative CPU 31495.04 sec
2017-10-10 11:08:26,725 Stage-1 map = 100%,  reduce = 2%, Cumulative CPU 31553.55 sec
2017-10-10 11:08:27,902 Stage-1 map = 100%,  reduce = 5%, Cumulative CPU 31735.23 sec
2017-10-10 11:08:29,009 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 32088.36 sec
2017-10-10 11:08:30,082 Stage-1 map = 100%,  reduce = 14%, Cumulative CPU 32302.42 sec
2017-10-10 11:08:31,187 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 32811.1 sec
2017-10-10 11:08:32,278 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 33479.58 sec
2017-10-10 11:08:33,401 Stage-1 map = 100%,  reduce = 58%, Cumulative CPU 35074.84 sec
2017-10-10 11:08:34,499 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 35768.42 sec
2017-10-10 11:08:35,614 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 36244.21 sec
2017-10-10 11:08:36,717 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 36656.17 sec
2017-10-10 11:08:37,824 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 37125.4 sec
2017-10-10 11:08:38,913 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 37296.03 sec
2017-10-10 11:08:40,012 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 37627.99 sec
2017-10-10 11:08:52,096 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 37770.22 sec
MapReduce Total cumulative CPU time: 0 days 10 hours 29 minutes 30 seconds 220 msec
Ended Job = job_1505909724027_363932
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1316  Reduce: 1099   Cumulative CPU: 37770.22 sec   HDFS Read: 4190346327 HDFS Write: 52 SUCCESS
Total MapReduce CPU Time Spent: 0 days 10 hours 29 minutes 30 seconds 220 msec
OK
w        20171008        2
a        20171008        13819276
i        20171008        1984982
Time taken: 480.473 seconds, Fetched: 3 row(s)
[/mw_shl_code]
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条