问题导读:
1、什么是GMV?
2、什么是转化率?
3、如何计算新增用户占日活跃用户比率?
4、如何使用用户行为做漏斗分析?
上一篇:大数据项目之电商数仓(总结)(三):系统业务数据仓库
第4章 需求一:GMV成交总额
4.1 ADS层
4.1.1 什么是GMV
4.1.2 建表语句
- hive (gmall)>
- drop table if exists ads_gmv_sum_day;
- create external table ads_gmv_sum_day(
- `dt` string COMMENT '统计日期',
- `gmv_count` bigint COMMENT '当日gmv订单个数',
- `gmv_amount` decimal(16,2) COMMENT '当日gmv订单总金额',
- `gmv_payment` decimal(16,2) COMMENT '当日支付金额'
- ) COMMENT 'GMV'
- row format delimited fields terminated by '\t'
- location '/warehouse/gmall/ads/ads_gmv_sum_day/';
复制代码
4.1.3 数据导入
1)数据导入
- hive (gmall)>
- insert into table ads_gmv_sum_day
- select
- '2019-02-10' dt,
- sum(order_count) gmv_count,
- sum(order_amount) gmv_amount,
- sum(payment_amount) payment_amount
- from dws_user_action
- where dt ='2019-02-10'
- group by dt;
复制代码
2)查询导入数据
- hive (gmall)> select * from ads_gmv_sum_day;
复制代码
4.1.4 数据导入脚本
1)在/home/kgg/bin目录下创建脚本ads_db_gmv.sh
- [kgg@hadoop102 bin]$ vim ads_db_gmv.sh
- 在脚本中填写如下内容
- #!/bin/bash
-
- # 定义变量方便修改
- APP=gmall
- hive=/opt/module/hive/bin/hive
-
- # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
- if [ -n "$1" ] ;then
- do_date=$1
- else
- do_date=`date -d "-1 day" +%F`
- fi
-
- sql="
- insert into table "$APP".ads_gmv_sum_day
- select
- '$do_date' dt,
- sum(order_count) gmv_count,
- sum(order_amount) gmv_amount,
- sum(payment_amount) payment_amount
- from "$APP".dws_user_action
- where dt ='$do_date'
- group by dt;
- "
-
- $hive -e "$sql"
复制代码
2)增加脚本执行权限
- [kgg@hadoop102 bin]$ chmod 777 ads_db_gmv.sh
复制代码
3)执行脚本导入数据
- [kgg@hadoop102 bin]$ ads_db_gmv.sh 2019-02-11
复制代码
4)查看导入数据
- hive (gmall)>
- select * from ads_gmv_sum_day where dt='2019-02-11' limit 2;
复制代码
第5章 需求二:转化率之用户新鲜度及漏斗分析
5.1 什么是转化率
5.2 ADS层之新增用户占日活跃用户比率(用户新鲜度)
5.2.1 建表语句
- hive (gmall)>
- drop table if exists ads_user_convert_day;
- create external table ads_user_convert_day(
- `dt` string COMMENT '统计日期',
- `uv_m_count` bigint COMMENT '当日活跃设备',
- `new_m_count` bigint COMMENT '当日新增设备',
- `new_m_ratio` decimal(10,2) COMMENT '当日新增占日活的比率'
- ) COMMENT '转化率'
- row format delimited fields terminated by '\t'
- location '/warehouse/gmall/ads/ads_user_convert_day/';
复制代码
5.2.2 数据导入
1)数据导入
- hive (gmall)>
- insert into table ads_user_convert_day
- select
- '2019-02-10',
- sum(uc.dc) sum_dc,
- sum(uc.nmc) sum_nmc,
- sum( uc.nmc)/sum( uc.dc)*100 new_m_ratio
- from
- (
- select
- day_count dc,
- 0 nmc
- from ads_uv_count
- where dt='2019-02-10'
-
- union all
- select
- 0 dc,
- new_mid_count nmc
- from ads_new_mid_count
- where create_date='2019-02-10'
- )uc;
复制代码
2)查看导入数据
- hive (gmall)>
- select * from ads_user_convert_day;
复制代码
5.3 ADS层之用户行为漏斗分析
5.3.1 建表语句
- hive (gmall)>
- drop table if exists ads_user_action_convert_day;
- create external table ads_user_action_convert_day(
- `dt` string COMMENT '统计日期',
- `total_visitor_m_count` bigint COMMENT '总访问人数',
- `order_u_count` bigint COMMENT '下单人数',
- `visitor2order_convert_ratio` decimal(10,2) COMMENT '访问到下单转化率',
- `payment_u_count` bigint COMMENT '支付人数',
- `order2payment_convert_ratio` decimal(10,2) COMMENT '下单到支付的转化率'
- ) COMMENT '用户行为漏斗分析'
- row format delimited fields terminated by '\t'
- location '/warehouse/gmall/ads/ads_user_action_convert_day/';
复制代码
5.3.2 数据导入
1)数据导入
- hive (gmall)>
- insert into table ads_user_action_convert_day
- select
- '2019-02-10',
- uv.day_count,
- ua.order_count,
- cast(ua.order_count/uv.day_count as decimal(10,2)) visitor2order_convert_ratio,
- ua.payment_count,
- cast(ua.payment_count/ua.order_count as decimal(10,2)) order2payment_convert_ratio
- from
- (
- select
- dt,
- sum(if(order_count>0,1,0)) order_count,
- sum(if(payment_count>0,1,0)) payment_count
- from dws_user_action
- where dt='2019-02-10'
- group by dt
- )ua join ads_uv_count uv on uv.dt=ua.dt;
复制代码
2)查询导入数据
- hive (gmall)> select * from ads_user_action_convert_day;
复制代码
最新经典文章,欢迎关注公众号
|