分享

企业级数据仓库构建(十):搭建 ADS 层

本帖最后由 levycui 于 2020-9-22 19:49 编辑
问题导读:
1、如何设计设备主题?
2、如何设计会员主题(会员主题+漏斗分析)?
3、如何设计商品主题(个数+销量)?
4、如何设计营销主题(用户+商品+购买行为)?



上一篇:数仓项目(九)数仓搭建 - DWS 层

一、数仓搭建 - ADS 层
1.1 设备主题
1.1.1 活跃设备数(日、周、月)

需求定义:
日活:当日活跃的设备数
周活:当周活跃的设备数
月活:当月活跃的设备数
1)建表语句
  1. drop table if exists ads_uv_count;
  2. create external table ads_uv_count(
  3. `dt` string COMMENT '统计日期',
  4. `day_count` bigint COMMENT '当日用户数量',
  5. `wk_count` bigint COMMENT '当周用户数量',
  6. `mn_count` bigint COMMENT '当月用户数量',
  7. `is_weekend` string COMMENT 'Y,N 是否是周末,用于得到本周最终结果',
  8. `is_monthend` string COMMENT 'Y,N 是否是月末,用于得到本月最终结果'
  9. ) COMMENT '活跃设备数'
  10. row format delimited fields terminated by '\t'
  11. location '/warehouse/gmall/ads/ads_uv_count/';
复制代码
2)导入数据

  1. insert into table ads_uv_count
  2. select
  3. '2020-03-10' dt,
  4. daycount.ct,
  5. wkcount.ct,
  6. mncount.ct,
  7. if(date_add(next_day('2020-03-10','MO'),-1)='2020-03-10','Y','N') ,
  8. if(last_day('2020-03-10')='2020-03-10','Y','N')
  9. from
  10. (
  11. select
  12. '2020-03-10' dt,
  13. count(*) ct
  14. from dwt_uv_topic
  15. where login_date_last='2020-03-10'
  16. )daycount join
  17. (
  18. select
  19. '2020-03-10' dt,
  20. count (*) ct
  21. from dwt_uv_topic
  22. where login_date_last>=date_add(next_day('2020-03-10','MO'),-7)
  23. and login_date_last<= date_add(next_day('2020-03-10','MO'),-1)
  24. ) wkcount on daycount.dt=wkcount.dt
  25. join
  26. (
  27. select
  28. '2020-03-10' dt,
  29. count (*) ct
  30. from dwt_uv_topic
  31. where
  32. date_format(login_date_last,'yyyy-MM')=date_format('2020-03-10','yyyy-MM')
  33. )mncount on daycount.dt=mncount.dt;
复制代码
3)查询导入结果
  1. select * from ads_uv_count;
复制代码

1.1.2 每日新增设备
1)建表语句
  1. drop table if exists ads_new_mid_count;
  2. create external table ads_new_mid_count
  3. (
  4. `create_date` string comment '创建时间' ,
  5. `new_mid_count` BIGINT comment '新增设备数量'
  6. ) COMMENT '每日新增设备信息数量'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_new_mid_count/';
复制代码
2)导入数据
  1. insert into table ads_new_mid_count
  2. select
  3. login_date_first,
  4. count(*)
  5. from dwt_uv_topic
  6. where login_date_first='2020-03-10'
  7. group by login_date_first;
复制代码

3)查询导入数据
  1. select * from ads_new_mid_count;
复制代码

1.1.3 沉默用户数
需求定义:
沉默用户:只在安装当天启动过,且启动时间是在 7 天前
1)建表语句
  1. drop table if exists ads_silent_count;
  2. create external table ads_silent_count(
  3. `dt` string COMMENT '统计日期',
  4. `silent_count` bigint COMMENT '沉默设备数'
  5. )
  6. row format delimited fields terminated by '\t'
  7. location '/warehouse/gmall/ads/ads_silent_count';
复制代码

2)导入 2020-03-20 数据
  1. insert into table ads_silent_count
  2. select
  3. '2020-03-15',
  4. count(*)
  5. from dwt_uv_topic
  6. where login_date_first=login_date_last
  7. and login_date_last<=date_add('2020-03-15',-7);
复制代码
3)查询导入数据
  1. select * from ads_silent_count;
复制代码

1.1.4 本周回流用户数

需求定义:
本周回流用户:上周未活跃,本周活跃的设备,且不是本周新增设备
1)建表语句
  1. drop table if exists ads_back_count;
  2. create external table ads_back_count(
  3. `dt` string COMMENT '统计日期',
  4. `wk_dt` string COMMENT '统计日期所在周',
  5. `wastage_count` bigint COMMENT '回流设备数'
  6. )
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_back_count';
复制代码
2)导入数据:
  1. insert into table ads_back_count
  2. select
  3. '2020-03-15',
  4. count(*)
  5. from
  6. (
  7. select
  8. mid_id
  9. from dwt_uv_topic
  10. where login_date_last>=date_add(next_day('2020-03-15','MO'),-7)
  11. and login_date_last<= date_add(next_day('2020-03-15','MO'),-1)
  12. and login_date_first<date_add(next_day('2020-03-15','MO'),-7)
  13. )current_wk
  14. left join
  15. (
  16. select
  17. mid_id
  18. from dws_uv_detail_daycount
  19. where dt>=date_add(next_day('2020-03-15','MO'),-7*2)
  20. and dt<= date_add(next_day('2020-03-15','MO'),-7-1)
  21. group by mid_id
  22. )last_wk
  23. on current_wk.mid_id=last_wk.mid_id
  24. where last_wk.mid_id is null;
复制代码
3)查询结果
  1. select * from ads_back_count;
复制代码

1.1.5 流失用户数
需求定义:
流失用户:最近 7 天未活跃的设备
1)建表语句
  1. drop table if exists ads_wastage_count;
  2. create external table ads_wastage_count(
  3. `dt` string COMMENT '统计日期',
  4. `wastage_count` bigint COMMENT '流失设备数'
  5. )
  6. row format delimited fields terminated by '\t'
  7. location '/warehouse/gmall/ads/ads_wastage_count';
复制代码
2)导入 2020-03-20 数据
  1. insert into table ads_wastage_count
  2. select
  3. '2020-03-20',
  4. count(*)
  5. from
  6. (
  7. select
  8. mid_id
  9. from dwt_uv_topic
  10. where login_date_last<=date_add('2020-03-20',-7)
  11. group by mid_id
  12. )t1;
复制代码
3)查询结果
  1. select * from ads_wastage_count;
复制代码

1.1.6 留存率
2020-09-22_192619.jpg


1)建表语句
  1. drop table if exists ads_user_retention_day_rate;
  2. create external table ads_user_retention_day_rate
  3. (
  4. `stat_date` string comment '统计日期',
  5. `create_date` string comment '设备新增日期',
  6. `retention_day` int comment '截止当前日期留存天数',
  7. `retention_count` bigint comment '留存数量',
  8. `new_mid_count` bigint comment '设备新增数量',
  9. `retention_ratio` decimal(10,2) comment '留存率'
  10. ) COMMENT '每日用户留存情况'
  11. row format delimited fields terminated by '\t'
  12. location '/warehouse/gmall/ads/ads_user_retention_day_rate/';
复制代码
2)导入数据
  1. insert into table ads_user_retention_day_rate
  2. select
  3. '2020-03-10',--统计日期
  4. date_add('2020-03-10',-1),--新增日期
  5. 1,--留存天数
  6. sum(if(login_date_first=date_add('2020-03-10',-1) and
  7. login_date_last='2020-03-10',1,0)),--2020-03-09 的 1 日留存数
  8. sum(if(login_date_first=date_add('2020-03-10',-1),1,0)),--2020-03-09 新增
  9. sum(if(login_date_first=date_add('2020-03-10',-1) and
  10. login_date_last='2020-03-10',1,0))/sum(if(login_date_first=date_add('2020-03-10',-
  11. 1),1,0))*100
  12. from dwt_uv_topic
  13. union all
  14. select
  15. '2020-03-10',--统计日期
  16. date_add('2020-03-10',-2),--新增日期
  17. 2,--留存天数
  18. sum(if(login_date_first=date_add('2020-03-10',-2) and
  19. login_date_last='2020-03-10',1,0)),--2020-03-08 的 2 日留存数
  20. sum(if(login_date_first=date_add('2020-03-10',-2),1,0)),--2020-03-08 新增
  21. sum(if(login_date_first=date_add('2020-03-10',-2) and
  22. login_date_last='2020-03-10',1,0))/sum(if(login_date_first=date_add('2020-03-10',-
  23. 2),1,0))*100
  24. from dwt_uv_topic
  25. union all
  26. select
  27. '2020-03-10',--统计日期
  28. date_add('2020-03-10',-3),--新增日期
  29. 3,--留存天数
  30. sum(if(login_date_first=date_add('2020-03-10',-3) and
  31. login_date_last='2020-03-10',1,0)),--2020-03-07 的 3 日留存数
  32. sum(if(login_date_first=date_add('2020-03-10',-3),1,0)),--2020-03-07 新增
  33. sum(if(login_date_first=date_add('2020-03-10',-3) and
  34. login_date_last='2020-03-10',1,0))/sum(if(login_date_first=date_add('2020-03-10',-
  35. 3),1,0))*100
  36. from dwt_uv_topic;
复制代码
3)查询导入数据
  1. select * from ads_user_retention_day_rate;
复制代码

1.1.7 最近连续三周活跃用户数

1)建表语句
  1. drop table if exists ads_continuity_wk_count;
  2. create external table ads_continuity_wk_count(
  3. `dt` string COMMENT '统计日期,一般用结束周周日日期,如果每天计算一次,可用当天日
  4. 期',
  5. `wk_dt` string COMMENT '持续时间',
  6. `continuity_count` bigint COMMENT '活跃次数'
  7. )
  8. row format delimited fields terminated by '\t'
  9. location '/warehouse/gmall/ads/ads_continuity_wk_count';
复制代码

2)导入 2020-03-20 所在周的数据
  1. insert into table ads_continuity_wk_count
  2. select
  3. '2020-03-15',
  4. concat(date_add(next_day('2020-03-15','MO'),-7*3),'_',date_add(next_day('
  5. 2020-03-15','MO'),-1)),
  6. count(*)
  7. from
  8. (
  9. select
  10. mid_id
  11. from
  12. (
  13. select
  14. mid_id
  15. from dws_uv_detail_daycount
  16. where dt>=date_add(next_day('2020-03-10','monday'),-7)
  17. and dt<=date_add(next_day('2020-03-10','monday'),-1)
  18. group by mid_id
  19. union all
  20. select
  21. mid_id
  22. from dws_uv_detail_daycount
  23. where dt>=date_add(next_day('2020-03-10','monday'),-7*2)
  24. and dt<=date_add(next_day('2020-03-10','monday'),-7-1)
  25. group by mid_id
  26. union all
  27. select
  28. mid_id
  29. from dws_uv_detail_daycount
  30. where dt>=date_add(next_day('2020-03-10','monday'),-7*3)
  31. and dt<=date_add(next_day('2020-03-10','monday'),-7*2-1)
  32. group by mid_id
  33. )t1
  34. group by mid_id
  35. having count(*)=3
  36. )t2
复制代码
3)查询
  1. select * from ads_continuity_wk_count;
复制代码

1.1.8 最近七天内连续三天活跃用户数

1)建表语句
  1. drop table if exists ads_continuity_uv_count;
  2. create external table ads_continuity_uv_count(
  3. `dt` string COMMENT '统计日期',
  4. `wk_dt` string COMMENT '最近 7 天日期',
  5. `continuity_count` bigint
  6. ) COMMENT '连续活跃设备数'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_continuity_uv_count';
复制代码
2)写出导入数据的 SQL 语句
  1. insert into table ads_continuity_uv_count
  2. select
  3. '2020-03-12',
  4. concat(date_add('2020-03-12',-6),'_','2020-03-12'),
  5. count(*)
  6. from
  7. (
  8. select mid_id
  9. from
  10. (
  11. select mid_id
  12. from
  13. (
  14. select
  15. mid_id,
  16. date_sub(dt,rank) date_dif
  17. from
  18. (
  19. select
  20. mid_id,
  21. dt,
  22. rank() over(partition by mid_id order by dt) rank
  23. from dws_uv_detail_daycount
  24. where dt>=date_add('2020-03-12',-6) and
  25. dt<='2020-03-12'
  26. )t1
  27. )t2
  28. group by mid_id,date_dif
  29. having count(*)>=3
  30. )t3
  31. group by mid_id
  32. )t4;
复制代码
3)查询
  1. select * from ads_continuity_uv_count;
复制代码

7.2 会员主题
7.2.1 会员主题信息

1)建表
  1. drop table if exists ads_user_topic;
  2. create external table ads_user_topic(
  3. `dt` string COMMENT '统计日期',
  4. `day_users` string COMMENT '活跃会员数',
  5. `day_new_users` string COMMENT '新增会员数',
  6. `day_new_payment_users` string COMMENT '新增消费会员数',
  7. `payment_users` string COMMENT '总付费会员数',
  8. `users` string COMMENT '总会员数',
  9. `day_users2users` decimal(10,2) COMMENT '会员活跃率',
  10. `payment_users2users` decimal(10,2) COMMENT '会员付费率',
  11. `day_new_users2users` decimal(10,2) COMMENT '会员新鲜度'
  12. ) COMMENT '会员主题信息表'
  13. row format delimited fields terminated by '\t'
  14. location '/warehouse/gmall/ads/ads_user_topic';
复制代码
2)导入数据
  1. insert into table ads_user_topic
  2. select
  3. '2020-03-10',
  4. sum(if(login_date_last='2020-03-10',1,0)),
  5. sum(if(login_date_first='2020-03-10',1,0)),
  6. sum(if(payment_date_first='2020-03-10',1,0)),
  7. sum(if(payment_count>0,1,0)),
  8. count(*),
  9. sum(if(login_date_last='2020-03-10',1,0))/count(*),
  10. sum(if(payment_count>0,1,0))/count(*),
  11. sum(if(login_date_first='2020-03-10',1,0))/sum(if(login_date_last='2020-03-10',1,0))
  12. from dwt_user_topic
复制代码
3)查询数据
  1. hive (gmall)> select * from ads_user_topic;
复制代码
4)vim ads_user_topic.sh
添加如下内容:
  1. #!/bin/bash
  2. APP=gmall
  3. hive=/opt/modules/hive/bin/hive
  4. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  5. if [ -n "$1" ] ;then
  6. do_date=$1
  7. else
  8. do_date=`date -d "-1 day" +%F`
  9. fi
  10. sql="
  11. with
  12. tmp_day_users as
  13. (
  14. select
  15. '$do_date' dt,
  16. count(*) day_users
  17. from
  18. ${APP}.dwt_user_topic
  19. where
  20. login_date_last='$do_date'
  21. ),
  22. tmp_day_new_users as
  23. (
  24. select
  25. '$do_date' dt,
  26. count(*) day_new_users
  27. from
  28. ${APP}.dwt_user_topic
  29. where
  30. login_date_last='$do_date' and login_date_first='$do_date'
  31. ),
  32. tmp_day_new_payment_users as
  33. (
  34. select
  35. '$do_date' dt,
  36. count(*) day_new_payment_users
  37. from
  38. ${APP}.dwt_user_topic
  39. where
  40. payment_date_first='$do_date'
  41. ),
  42. tmp_payment_users as
  43. (
  44. select
  45. '$do_date' dt,
  46. count(*) payment_users
  47. from
  48. ${APP}.dwt_user_topic
  49. where
  50. payment_date_first is not null
  51. ),
  52. tmp_users as
  53. (
  54. select
  55. '$do_date' dt,
  56. count(*) users
  57. from
  58. ${APP}.dwt_user_topic
  59. tmp_users
  60. )
  61. insert into table ${APP}.ads_user_topic
  62. select
  63. '$do_date' dt,
  64. day_users,
  65. day_new_users,
  66. day_new_payment_users,
  67. payment_users,
  68. users,
  69. day_users/users,
  70. payment_users/users,
  71. day_new_users/users
  72. from
  73. tmp_day_users
  74. join
  75. tmp_day_new_users
  76. on
  77. tmp_day_users.dt=tmp_day_new_users.dt
  78. join
  79. tmp_day_new_payment_users
  80. on
  81. tmp_day_users.dt=tmp_day_new_payment_users.dt
  82. join
  83. tmp_payment_users
  84. on
  85. tmp_day_users.dt=tmp_payment_users.dt
  86. join
  87. tmp_users
  88. on
  89. tmp_day_users.dt=tmp_users.dt;
  90. "
  91. $hive -e "$sql"
复制代码
5)增加脚本执行权限
  1. chmod 770 ads_user_topic.sh
复制代码
6)执行脚本导入数据
  1. ads_user_topic.sh 2020-03-11
复制代码
7)查看导入数据
  1. select * from ads_user_topic;
复制代码

7.2.2 漏斗分析

统计“浏览->购物车->下单->支付”的转化率
思路:统计各个行为的人数,然后计算比值
1)建表语句
  1. drop table if exists ads_user_action_convert_day;
  2. create external table ads_user_action_convert_day(
  3. `dt` string COMMENT '统计日期',
  4. `total_visitor_m_count` bigint COMMENT '总访问人数',
  5. `cart_u_count` bigint COMMENT '加入购物车的人数',
  6. `visitor2cart_convert_ratio` decimal(10,2) COMMENT '访问到加入购物车转化率',
  7. `order_u_count` bigint COMMENT '下单人数',
  8. `cart2order_convert_ratio` decimal(10,2) COMMENT '加入购物车到下单转化率',
  9. `payment_u_count` bigint COMMENT '支付人数',
  10. `order2payment_convert_ratio` decimal(10,2) COMMENT '下单到支付的转化率'
  11. ) COMMENT '用户行为漏斗分析'
  12. row format delimited fields terminated by '\t'
  13. location '/warehouse/gmall/ads/ads_user_action_convert_day/';
复制代码
2)数据装载
  1. insert into table ads_user_action_convert_day
  2. select
  3. '2020-03-10',
  4. uv.day_count,
  5. ua.cart_count,
  6. cast(ua.cart_count/uv.day_count as decimal(10,2)) visitor2cart_convert_ratio,
  7. ua.order_count,
  8. cast(ua.order_count/ua.cart_count as decimal(10,2)) visitor2order_convert_ratio,
  9. ua.payment_count,
  10. cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio
  11. from
  12. (
  13. select
  14. dt,
  15. sum(if(cart_count>0,1,0)) cart_count,
  16. sum(if(order_count>0,1,0)) order_count,
  17. sum(if(payment_count>0,1,0)) payment_count
  18. from dws_user_action_daycount
  19. where dt='2020-03-10'
  20. group by dt
  21. )ua join ads_uv_count uv on uv.dt=ua.dt;
复制代码
3)查询加载数据
  1. select * from ads_user_action_convert_day;
复制代码

7.3 商品主题
7.3.1 商品个数信息
1)建表语句
  1. drop table if exists ads_product_info;
  2. create external table ads_product_info(
  3. `dt` string COMMENT '统计日期',
  4. `sku_num` string COMMENT 'sku 个数',
  5. `spu_num` string COMMENT 'spu 个数'
  6. ) COMMENT '商品个数信息'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_product_info';
复制代码
2)导入数据
  1. insert into table ads_product_info
  2. select
  3. '2020-03-10' dt,
  4. sku_num,
  5. spu_num
  6. from
  7. (
  8. select
  9. '2020-03-10' dt,
  10. count(*) sku_num
  11. from
  12. dwt_sku_topic
  13. ) tmp_sku_num
  14. join
  15. (
  16. select
  17. '2020-03-10' dt,
  18. count(*) spu_num
  19. from
  20. (
  21. select
  22. spu_id
  23. from
  24. dwt_sku_topic
  25. group by
  26. spu_id
  27. ) tmp_spu_id
  28. ) tmp_spu_num
  29. on
  30. tmp_sku_num.dt=tmp_spu_num.dt;
复制代码
3)查询结果数据
  1. select * from ads_product_info;
复制代码

7.3.2 商品销量排名

1)建表语句
  1. drop table if exists ads_product_sale_topN;
  2. create external table ads_product_sale_topN(
  3. `dt` string COMMENT '统计日期',
  4. `sku_id` string COMMENT '商品 ID',
  5. `payment_amount` bigint COMMENT '销量'
  6. ) COMMENT '商品个数信息'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_product_sale_topN';
复制代码
2)导入数据
  1. insert into table ads_product_sale_topN
  2. select
  3. '2020-03-10' dt,
  4. sku_id,
  5. payment_amount
  6. from
  7. dws_sku_action_daycount
  8. where
  9. dt='2020-03-10'
  10. order by payment_amount desc
  11. limit 10;
复制代码
3)查询结果数据
  1. select * from ads_product_sale_topN;
复制代码

7.3.3 商品收藏排名

1)建表语句

  1. drop table if exists ads_product_favor_topN;
  2. create external table ads_product_favor_topN(
  3. `dt` string COMMENT '统计日期',
  4. `sku_id` string COMMENT '商品 ID',
  5. `favor_count` bigint COMMENT '收藏量'
  6. ) COMMENT '商品收藏 TopN'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_product_favor_topN';
复制代码
2)导入数据
  1. insert into table ads_product_favor_topN
  2. select
  3. '2020-03-10' dt,
  4. sku_id,
  5. favor_count
  6. from
  7. dws_sku_action_daycount
  8. where
  9. dt='2020-03-10'
  10. order by favor_count desc
  11. limit 10;
复制代码
3)查询数据
  1. select * from ads_product_favor_topN;
复制代码

7.3.4 商品加入购物车排名

1)建表语句

  1. drop table if exists ads_product_cart_topN;
  2. create external table ads_product_cart_topN(
  3. `dt` string COMMENT '统计日期',
  4. `sku_id` string COMMENT '商品 ID',
  5. `cart_num` bigint COMMENT '加入购物车数量'
  6. ) COMMENT '商品加入购物车 TopN'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_product_cart_topN';
复制代码
2)导入数据
  1. insert into table ads_product_cart_topN
  2. select
  3. '2020-03-10' dt,
  4. sku_id,
  5. cart_num
  6. from
  7. dws_sku_action_daycount
  8. where
  9. dt='2020-03-10'
  10. order by cart_num desc
  11. limit 10;
复制代码
3)查询数据
  1. select * from ads_product_cart_topN;
复制代码

7.3.5 商品退款率排名(最近 30 天)

1)建表语句

  1. drop table if exists ads_product_refund_topN;
  2. create external table ads_product_refund_topN(
  3. `dt` string COMMENT '统计日期',
  4. `sku_id` string COMMENT '商品 ID',
  5. `refund_ratio` decimal(10,2) COMMENT '退款率'
  6. ) COMMENT '商品退款率 TopN'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_product_refund_topN';
复制代码
2)导入数据
  1. insert into table ads_product_refund_topN
  2. select
  3. '2020-03-10',
  4. sku_id,
  5. refund_last_30d_count/payment_last_30d_count*100 refund_ratio
  6. from dwt_sku_topic
  7. order by refund_ratio desc
  8. limit 10;
复制代码
3)查询数据
  1. select * from ads_product_refund_topN;
复制代码

7.3.6 商品差评率
1)建表语句

  1. drop table if exists ads_appraise_bad_topN;
  2. create external table ads_appraise_bad_topN(
  3. `dt` string COMMENT '统计日期',
  4. `sku_id` string COMMENT '商品 ID',
  5. `appraise_bad_ratio` decimal(10,2) COMMENT '差评率'
  6. ) COMMENT '商品差评率 TopN'
  7. row format delimited fields terminated by '\t'
  8. location '/warehouse/gmall/ads/ads_appraise_bad_topN';
复制代码
2)导入数据
  1. insert into table ads_appraise_bad_topN
  2. select
  3. '2020-03-10' dt,
  4. sku_id,
  5. appraise_bad_count/(appraise_good_count+appraise_mid_count+appraise_bad_coun
  6. t+appraise_default_count) appraise_bad_ratio
  7. from
  8. dws_sku_action_daycount
  9. where
  10. dt='2020-03-10'
  11. order by appraise_bad_ratio desc
  12. limit 10;
复制代码
3)查询数据
  1. select * from ads_appraise_bad_topN;
复制代码

7.4 营销主题(用户+商品+购买行为)
7.4.1 下单数目统计
需求分析:统计每日下单数,下单金额及下单用户数
1)建表语句
  1. drop table if exists ads_order_daycount;
  2. create external table ads_order_daycount(
  3. dt string comment '统计日期',
  4. order_count bigint comment '单日下单笔数',
  5. order_amount decimal(10,2) comment '单日下单金额',
  6. order_users bigint comment '单日下单用户数'
  7. ) comment '每日订单总计表'
  8. row format delimited fields terminated by '\t'
  9. location '/warehouse/gmall/ads/ads_order_daycount';
复制代码
2)导入数据
  1. insert into table ads_order_daycount
  2. select
  3. '2020-03-10',
  4. sum(order_count),
  5. sum(order_amount),
  6. sum(if(order_count>0,1,0))
  7. from dws_user_action_daycount
  8. where dt='2020-03-10';
复制代码
3)查询数据
  1. select * from ads_order_daycount;
复制代码

7.4.2 支付信息统计
每日支付金额、支付人数、支付商品数、支付笔数以及下单到支付的平均时长(取自 DWD)
1)建表
  1. drop table if exists ads_payment_daycount;
  2. create external table ads_payment_daycount(
  3. dt string comment '统计日期',
  4. payment_count bigint comment '单日支付笔数',
  5. payment_amount decimal(10,2) comment '单日支付金额',
  6. payment_user_count bigint comment '单日支付人数',
  7. payment_sku_count bigint comment '单日支付商品数',
  8. payment_avg_time double comment '下单到支付的平均时长,取分钟数'
  9. ) comment '每日订单总计表'
  10. row format delimited fields terminated by '\t'
  11. location '/warehouse/gmall/ads/ads_payment_daycount';
复制代码
2)导入数据
  1. insert into table ads_payment_daycount
  2. select
  3. tmp_payment.dt,
  4. tmp_payment.payment_count,
  5. tmp_payment.payment_amount,
  6. tmp_payment.payment_user_count,
  7. tmp_skucount.payment_sku_count,
  8. tmp_time.payment_avg_time
  9. from
  10. (
  11. select
  12. '2020-03-15' dt,
  13. sum(payment_count) payment_count,
  14. sum(payment_amount) payment_amount,
  15. sum(if(payment_count>0,1,0)) payment_user_count
  16. from dws_user_action_daycount
  17. where dt='2020-03-15'
  18. )tmp_payment
  19. join
  20. (
  21. select
  22. '2020-03-15' dt,
  23. sum(if(payment_count>0,1,0)) payment_sku_count
  24. from dws_sku_action_daycount
  25. where dt='2020-03-15'
  26. )tmp_skucount on tmp_payment.dt=tmp_skucount.dt
  27. join
  28. (
  29. select
  30. '2020-03-15' dt,
  31. sum(unix_timestamp(payment_time)-unix_timestamp(create_time))/count(*)/60
  32. payment_avg_time
  33. from dwd_fact_order_info
  34. where dt='2020-03-15'
  35. and payment_time is not null
  36. )tmp_time on tmp_payment.dt=tmp_time.dt
复制代码

3)查询数据
  1. select * from ads_payment_daycount;
复制代码

7.4.3 复购率
1)建表语句
  1. drop table ads_sale_tm_category1_stat_mn;
  2. create external table ads_sale_tm_category1_stat_mn
  3. (
  4. tm_id string comment '品牌 id',
  5. category1_id string comment '1 级品类 id ',
  6. category1_name string comment '1 级品类名称 ',
  7. buycount bigint comment '购买人数',
  8. buy_twice_last bigint comment '两次以上购买人数',
  9. buy_twice_last_ratio decimal(10,2) comment '单次复购率',
  10. buy_3times_last bigint comment '三次以上购买人数',
  11. buy_3times_last_ratio decimal(10,2) comment '多次复购率',
  12. stat_mn string comment '统计月份',
  13. stat_date string comment '统计日期'
  14. ) COMMENT '复购率统计'
  15. row format delimited fields terminated by '\t'
  16. location '/warehouse/gmall/ads/ads_sale_tm_category1_stat_mn/';
复制代码
2)数据导入
  1. insert into table ads_sale_tm_category1_stat_mn
  2. select
  3. mn.sku_tm_id,
  4. mn.sku_category1_id,
  5. mn.sku_category1_name,
  6. sum(if(mn.order_count>=1,1,0)) buycount,
  7. sum(if(mn.order_count>=2,1,0)) buyTwiceLast,
  8. sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0))
  9. buyTwiceLastRatio,
  10. sum(if(mn.order_count>=3,1,0)) buy3timeLast ,
  11. sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0))
  12. buy3timeLastRatio ,
  13. date_format('2020-03-10' ,'yyyy-MM') stat_mn,
  14. '2020-03-10' stat_date
  15. from
  16. (
  17.         select
  18.         user_id,
  19.         sd.sku_tm_id,
  20.         sd.sku_category1_id,
  21.         sd.sku_category1_name,
  22.         sum(order_count) order_count
  23.         from dws_sale_detail_daycount sd
  24.         where date_format(dt,'yyyy-MM')=date_format('2020-03-10' ,'yyyy-MM')
  25.         group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name
  26. ) mn
  27. group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;
复制代码

7.5 ADS 层导入脚本

1)vim dwt_to_ads.sh
在脚本中填写如下内容
  1. #!/bin/bash
  2. hive=/opt/modules/hive/bin/hive
  3. # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
  4. if [ -n "$1" ] ;then
  5. do_date=$1
  6. else
  7. do_date=`date -d "-1 day" +%F`
  8. fi
  9. sql="use gmall;
  10. insert into table ads_uv_count
  11. select
  12. '$do_date',
  13. sum(if(login_date_last='$do_date',1,0)),
  14. sum(if(login_date_last>=date_add(next_day('$do_date','monday'),-7) and
  15. login_date_last<=date_add(next_day('$do_date','monday'),-1) ,1,0)),
  16. sum(if(date_format(login_date_last,'yyyy-MM')=date_format('$do_date','yyyy-M
  17. M'),1,0)),
  18. if('$do_date'=date_add(next_day('$do_date','monday'),-1),'Y','N'),
  19. if('$do_date'=last_day('$do_date'),'Y','N')
  20. from dwt_uv_topic;
  21. insert into table ads_new_mid_count
  22. select
  23. '$do_date',
  24. count(*)
  25. from dwt_uv_topic
  26. where login_date_first='$do_date';
  27. insert into table ads_silent_count
  28. select
  29. '$do_date',
  30. count(*)
  31. from dwt_uv_topic
  32. where login_date_first=login_date_last
  33. and login_date_last<=date_add('$do_date',-7);
  34. insert into table ads_back_count
  35. select
  36. '$do_date',
  37. concat(date_add(next_day('2020-03-10','MO'),-7),'_',date_add(next_day('2020-
  38. 03-10','MO'),-1)),
  39. count(*)
  40. from
  41. (
  42. select
  43. mid_id
  44. from dwt_uv_topic
  45. where login_date_last>=date_add(next_day('$do_date','MO'),-7)
  46. and login_date_last<= date_add(next_day('$do_date','MO'),-1)
  47. and login_date_first<date_add(next_day('$do_date','MO'),-7)
  48. )current_wk
  49. left join
  50. (
  51. select
  52. mid_id
  53. from dws_uv_detail_daycount
  54. where dt>=date_add(next_day('$do_date','MO'),-7*2)
  55. and dt<= date_add(next_day('$do_date','MO'),-7-1)
  56. group by mid_id
  57. )last_wk
  58. on current_wk.mid_id=last_wk.mid_id
  59. where last_wk.mid_id is null;
  60. insert into table ads_wastage_count
  61. select
  62. '$do_date',
  63. count(*)
  64. from dwt_uv_topic
  65. where login_date_last<=date_add('$do_date',-7);
  66. insert into table ads_user_retention_day_rate
  67. select
  68. '$do_date',
  69. date_add('$do_date',-3),
  70. 3,
  71. sum(if(login_date_first=date_add('$do_date',-3) and
  72. login_date_last='$do_date',1,0)),
  73. sum(if(login_date_first=date_add('$do_date',-3),1,0)),
  74. sum(if(login_date_first=date_add('$do_date',-3) and
  75. login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date',
  76. -3),1,0))*100
  77. from dwt_uv_topic
  78. union all
  79. select
  80. '$do_date',
  81. date_add('$do_date',-2),
  82. 2,
  83. sum(if(login_date_first=date_add('$do_date',-2) and
  84. login_date_last='$do_date',1,0)),
  85. sum(if(login_date_first=date_add('$do_date',-2),1,0)),
  86. sum(if(login_date_first=date_add('$do_date',-2) and
  87. login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date',
  88. -2),1,0))*100
  89. from dwt_uv_topic
  90. union all
  91. select
  92. '$do_date',
  93. date_add('$do_date',-1),
  94. 1,
  95. sum(if(login_date_first=date_add('$do_date',-1) and
  96. login_date_last='$do_date',1,0)),
  97. sum(if(login_date_first=date_add('$do_date',-1),1,0)),
  98. sum(if(login_date_first=date_add('$do_date',-1) and
  99. login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date',
  100. -1),1,0))*100
  101. from dwt_uv_topic;
  102. insert into table ads_continuity_wk_count
  103. select
  104. '$do_date',
  105. concat(date_add(next_day('$do_date','MO'),-7*3),'_',date_add(next_day('$do_d
  106. ate','MO'),-1)),
  107. count(*)
  108. from
  109. (
  110. select
  111. mid_id
  112. from
  113. (
  114. select
  115. mid_id
  116. from dws_uv_detail_daycount
  117. where dt>=date_add(next_day('$do_date','monday'),-7)
  118. and dt<=date_add(next_day('$do_date','monday'),-1)
  119. group by mid_id
  120. union all
  121. select
  122. mid_id
  123. from dws_uv_detail_daycount
  124. where dt>=date_add(next_day('$do_date','monday'),-7*2)
  125. and dt<=date_add(next_day('$do_date','monday'),-7-1)
  126. group by mid_id
  127. union all
  128. select
  129. mid_id
  130. from dws_uv_detail_daycount
  131. where dt>=date_add(next_day('$do_date','monday'),-7*3)
  132. and dt<=date_add(next_day('$do_date','monday'),-7*2-1)
  133. group by mid_id
  134. )t1
  135. group by mid_id
  136. having count(*)=3
  137. )t2;
  138. insert into table ads_continuity_uv_count
  139. select
  140. '$do_date',
  141. concat(date_add('$do_date',-6),'_','$do_date'),
  142. count(*)
  143. from
  144. (
  145. select mid_id
  146. from
  147. (
  148. select mid_id
  149. from
  150. (
  151. select
  152. mid_id,
  153. date_sub(dt,rank) date_dif
  154. from
  155. (
  156. select
  157. mid_id,
  158. dt,
  159. rank() over(partition by mid_id order by dt) rank
  160. from dws_uv_detail_daycount
  161. where dt>=date_add('$do_date',-6) and dt<='$do_date'
  162. )t1
  163. )t2
  164. group by mid_id,date_dif
  165. having count(*)>=3
  166. )t3
  167. group by mid_id
  168. )t4;
  169. insert into table ads_user_topic
  170. select
  171. '$do_date',
  172. sum(if(login_date_last='$do_date',1,0)),
  173. sum(if(login_date_first='$do_date',1,0)),
  174. sum(if(payment_date_first='$do_date',1,0)),
  175. sum(if(payment_count>0,1,0)),
  176. count(*),
  177. sum(if(login_date_last='$do_date',1,0))/count(*),
  178. sum(if(payment_count>0,1,0))/count(*),
  179. sum(if(login_date_first='$do_date',1,0))/sum(if(login_date_last='$do_date',1,
  180. 0))
  181. from dwt_user_topic;
  182. insert into table ads_user_action_convert_day
  183. select
  184. '$do_date',
  185. uv.day_count,
  186. ua.cart_count,
  187. ua.cart_count/uv.day_count*100 visitor2cart_convert_ratio,
  188. ua.order_count,
  189. ua.order_count/ua.cart_count*100 visitor2order_convert_ratio,
  190. ua.payment_count,
  191. ua.payment_count/ua.order_count*100 order2payment_convert_ratio
  192. from
  193. (
  194. select
  195. '$do_date' dt,
  196. sum(if(cart_count>0,1,0)) cart_count,
  197. sum(if(order_count>0,1,0)) order_count,
  198. sum(if(payment_count>0,1,0)) payment_count
  199. from dws_user_action_daycount
  200. where dt='$do_date'
  201. )ua join ads_uv_count uv on uv.dt=ua.dt;
  202. insert into table ads_product_info
  203. select
  204. '$do_date' dt,
  205. sku_num,
  206. spu_num
  207. from
  208. (
  209. select
  210. '$do_date' dt,
  211. count(*) sku_num
  212. from
  213. dwt_sku_topic
  214. ) tmp_sku_num
  215. join
  216. (
  217. select
  218. '$do_date' dt,
  219. count(*) spu_num
  220. from
  221. (
  222. select
  223. spu_id
  224. from
  225. dwt_sku_topic
  226. group by
  227. spu_id
  228. ) tmp_spu_id
  229. ) tmp_spu_num
  230. on tmp_sku_num.dt=tmp_spu_num.dt;
  231. insert into table ads_product_sale_topN
  232. select
  233. '$do_date',
  234. sku_id,
  235. payment_amount
  236. from dws_sku_action_daycount
  237. where dt='$do_date'
  238. order by payment_amount desc
  239. limit 10;
  240. insert into table ads_product_favor_topN
  241. select
  242. '$do_date',
  243. sku_id,
  244. favor_count
  245. from dws_sku_action_daycount
  246. where dt='$do_date'
  247. order by favor_count
  248. limit 10;
  249. insert into table ads_product_cart_topN
  250. select
  251. '$do_date' dt,
  252. sku_id,
  253. cart_num
  254. from dws_sku_action_daycount
  255. where dt='$do_date'
  256. order by cart_num
  257. limit 10;
  258. insert into table ads_product_refund_topN
  259. select
  260. '$do_date',
  261. sku_id,
  262. refund_last_30d_count/payment_last_30d_count*100 refund_ratio
  263. from dwt_sku_topic
  264. order by refund_ratio desc
  265. limit 10;
  266. insert into table ads_appraise_bad_topN
  267. select
  268. '$do_date' dt,
  269. sku_id,
  270. appraise_bad_count/(appraise_bad_count+appraise_good_count+appraise_mid_coun
  271. t+appraise_default_count)*100 appraise_bad_ratio
  272. from dws_sku_action_daycount
  273. where dt='$do_date'
  274. order by appraise_bad_ratio desc
  275. limit 10;
  276. insert into table ads_order_daycount
  277. select
  278. '$do_date',
  279. sum(order_count),
  280. sum(order_amount),
  281. sum(if(order_count>0,1,0))
  282. from dws_user_action_daycount
  283. where dt='$do_date';
  284. insert into table ads_payment_daycount
  285. select
  286. tmp_payment.dt,
  287. tmp_payment.payment_count,
  288. tmp_payment.payment_amount,
  289. tmp_payment.payment_user_count,
  290. tmp_skucount.payment_sku_count,
  291. tmp_time.payment_avg_time
  292. from
  293. (
  294. select
  295. '$do_date' dt,
  296. sum(payment_count) payment_count,
  297. sum(payment_amount) payment_amount,
  298. sum(if(payment_count>0,1,0)) payment_user_count
  299. from dws_user_action_daycount
  300. where dt='$do_date'
  301. )tmp_payment
  302. join
  303. (
  304. select
  305. '$do_date' dt,
  306. sum(if(payment_count>0,1,0)) payment_sku_count
  307. from dws_sku_action_daycount
  308. where dt='$do_date'
  309. )tmp_skucount on tmp_payment.dt=tmp_skucount.dt
  310. join
  311. (
  312. select
  313. '$do_date' dt,
  314. sum(unix_timestamp(payment_time)-unix_timestamp(create_time))/count(*)/60
  315. payment_avg_time
  316. from dwd_fact_order_info
  317. where dt='$do_date'
  318. and payment_time is not null
  319. )tmp_time on tmp_payment.dt=tmp_time.dt;
  320. insert into table ads_sale_tm_category1_stat_mn
  321. select
  322. mn.sku_tm_id,
  323. mn.sku_category1_id,
  324. mn.sku_category1_name,
  325. sum(if(mn.order_count>=1,1,0)) buycount,
  326. sum(if(mn.order_count>=2,1,0)) buyTwiceLast,
  327. sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0))
  328. buyTwiceLastRatio,
  329. sum(if(mn.order_count>=3,1,0)) buy3timeLast ,
  330. sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0))
  331. buy3timeLastRatio ,
  332. date_format('$do_date' ,'yyyy-MM') stat_mn,
  333. '$do_date' stat_date
  334. from
  335. (
  336. select
  337. user_id,
  338. sd.sku_tm_id,
  339. sd.sku_category1_id,
  340. sd.sku_category1_name,
  341. sum(order_count) order_count
  342. from dws_sale_detail_daycount sd
  343. where date_format(dt,'yyyy-MM')=date_format('$do_date' ,'yyyy-MM')
  344. group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name
  345. ) mn
  346. group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;
  347. "
  348. $hive -e "$sql"
复制代码

2)增加脚本执行权限
  1. chmod 770 dwt_to_ads.sh
复制代码
3)执行脚本导入数据
  1. dwt_to_ads.sh  2020-03-10
复制代码
4)查看导入数据


作者:together
来源:https://together.blog.csdn.net/article/details/105968377

最新经典文章,欢迎关注公众号

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条