问题导读
1.DWS 层有哪些专业术语?
2.系统函数有哪些?
3.nvl 函数基本语法是什么?
4.DWS 层(业务)包含哪些内容?
上一篇
企业级数据仓库构建(八):搭建DWD 层-业务数据
https://www.aboutyun.com/forum.php?mod=viewthread&tid=29557
一、数仓搭建 - DWS 层
1.1 业务术语
1)用户
用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android
系统根据 IMEI 号,IOS 系统根据 OpenUDID 来标识一个独立用户,每部手机一个用户
2)新增用户
首次联网使用应用的用户。如果一个用户首次打开某 APP,那这个用户定义为新增用
户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月
新增用户
3)活跃用户
打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计
为一个活跃用户
4)周(月)活跃用户
某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户
5)月活跃率
月活跃用户与截止到该月累计的用户总和之间的比例
6)沉默用户
用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用
户质量和用户与 APP 的匹配程度
7)版本分布
不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断 APP 各个版本之
间的优劣和用户行为习惯
8)本周回流用户
上周未启动过应用,本周启动了应用的用户
9)连续 n 周活跃用户
连续 n 周,每周至少启动一次
10)忠诚用户
连续活跃 5 周以上的用户
11)连续活跃用户
连续 2 周及以上活跃的用户
12)近期流失用户
连续 n(2<= n <= 4)周没有启动应用的用户。(第 n+1 周没有启动过)
13)留存用户
某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分
用户占当时新增用户的比例即是留存率
例如,5 月份新增用户 200,这 200 人在 6 月份启动过应用的有 100 人,7 月份启动过应用的有 80 人,8 月份启动过应用的有 50 人;则 5 月份新增用户一个月后的留存率是 50%,二个月后的留存率是 40%,三个月后的留存率是 25%
14)用户新鲜度
每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例
15)单次使用时长
每次启动使用的时间长度
16)日使用时长
累计一天内的使用时间长度
17)启动次数计算标准
IOS 平台应用退到后台就算一次独立的启动;Android 平台我们规定,两次启动之间的间隔小于 30 秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30 秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用 30 秒这个标准,但用户还是可以自定义此时间间隔
1.2 系统函数
1.2.1 collect_set 函数
1)创建原数据表
- drop table if exists stud;
- create table stud (name string, area string, course string, score int);
复制代码
2)向原数据表中插入数据
- insert into table stud values('zhang3','bj','math',88);
- insert into table stud values('li4','bj','math',99);
- insert into table stud values('wang5','sh','chinese',92);
- insert into table stud values('zhao6','sh','chinese',54);
- insert into table stud values('tian7','bj','chinese',91);
复制代码
3)查询表中数据
- select * from stud;
-
- stud.name stud.area stud.course stud.score
- zhang3 bj math 88
- li4 bj math 99
- wang5 sh chinese 92
- zhao6 sh chinese 54
- tian7 bj chinese 91
复制代码
4)把同一分组的不同行的数据聚合成一个集合
- select course, collect_set(area), avg(score) from stud group by course;
-
- chinese ["sh","bj"] 79.0
- math ["bj"] 93.5
复制代码
5) 用下标可以取某一个
- select course, collect_set(area)[0], avg(score) from
-
- stud group by course;
- chinese sh 79.0
- math bj 93.5
复制代码
1.2.2 nvl 函数
1)基本语法
NVL(表达式 1,表达式 2)
如果表达式 1 为空值,NVL 返回值为表达式 2 的值,否则返回表达式 1 的值。 该函数的目的是把一个空值(null)转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式 1 和表达式 2 的数据类型必须为同一个类型
1.2.3 日期处理函数
1)date_format 函数(根据格式整理日期)
- hive (gmall)> select date_format('2020-03-10','yyyy-MM');
- 2020-03
复制代码
2)date_add 函数(加减日期)
- hive (gmall)> select date_add('2020-03-10',-1);
- 2020-03-09
- hive (gmall)> select date_add('2020-03-10',1);
- 2020-03-11
复制代码
3)next_day 函数
(1)取当前天的下一个周一
- hive (gmall)> select next_day('2020-03-12','MO');
- 2020-03-16
-
- 说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
复制代码
(2)取当前周的周一
- hive (gmall)> select date_add(next_day('2020-03-12','MO'),-7);
- 2020-03-11
复制代码
4)last_day 函数(求当月最后一天日期)
- hive (gmall)> select last_day('2020-03-10');
- 2020-03-31
复制代码
1.3 DWS 层(用户行为)
1.3.1 每日设备行为
每日设备行为,主要按照 设备 id 统计
1)建表语句
- drop table if exists dws_uv_detail_daycount;
- create external table dws_uv_detail_daycount
- (
- `mid_id` string COMMENT '设备唯一标识',
- `user_id` string COMMENT '用户标识',
- `version_code` string COMMENT '程序版本号',
- `version_name` string COMMENT '程序版本名',
- `lang` string COMMENT '系统语言',
- `source` string COMMENT '渠道号',
- `os` string COMMENT '安卓系统版本',
- `area` string COMMENT '区域',
- `model` string COMMENT '手机型号',
- `brand` string COMMENT '手机品牌',
- `sdk_version` string COMMENT 'sdkVersion',
- `gmail` string COMMENT 'gmail',
- `height_width` string COMMENT '屏幕宽高',
- `app_time` string COMMENT '客户端日志产生时的时间',
- `network` string COMMENT '网络模式',
- `lng` string COMMENT '经度',
- `lat` string COMMENT '纬度',
- `login_count` bigint COMMENT '活跃次数'
- )
- partitioned by(dt string)
- stored as parquet
- location '/warehouse/gmall/dws/dws_uv_detail_daycount';
复制代码
2)数据装载
- insert overwrite table dws_uv_detail_daycount partition(dt='2020-03-10')
- select
- mid_id,
- concat_ws('|', collect_set(user_id)) user_id,
- concat_ws('|', collect_set(version_code)) version_code,
- concat_ws('|', collect_set(version_name)) version_name,
- concat_ws('|', collect_set(lang))lang,
- concat_ws('|', collect_set(source)) source,
- concat_ws('|', collect_set(os)) os,
- concat_ws('|', collect_set(area)) area,
- concat_ws('|', collect_set(model)) model,
- concat_ws('|', collect_set(brand)) brand,
- concat_ws('|', collect_set(sdk_version)) sdk_version,
- concat_ws('|', collect_set(gmail)) gmail,
- concat_ws('|', collect_set(height_width)) height_width,
- concat_ws('|', collect_set(app_time)) app_time,
- concat_ws('|', collect_set(network)) network,
- concat_ws('|', collect_set(lng)) lng,
- concat_ws('|', collect_set(lat)) lat,
- count(*) login_count
- from dwd_start_log
- where dt='2020-03-10'
- group by mid_id;
复制代码
3)查询加载结果
- select * from dws_uv_detail_daycount where dt='2020-03-10';
复制代码
1.4 DWS 层(业务)
DWS 层的宽表字段,是站在不同维度的视角去看事实表,重点关注事实表的度量值
1.4.1 每日会员行为
1)建表语句
- drop table if exists dws_user_action_daycount;
- create external table dws_user_action_daycount
- (
- user_id string comment '用户 id',
- login_count bigint comment '登录次数',
- cart_count bigint comment '加入购物车次数',
- cart_amount double comment '加入购物车金额',
- order_count bigint comment '下单次数',
- order_amount decimal(16,2) comment '下单金额',
- payment_count bigint comment '支付次数',
- payment_amount decimal(16,2) comment '支付金额'
- ) COMMENT '每日用户行为'
- PARTITIONED BY (`dt` string)
- stored as parquet
- location '/warehouse/gmall/dws/dws_user_action_daycount/'
- tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
- with
- tmp_login as
- (
- select
- user_id,
- count(*) login_count
- from dwd_start_log
- where dt='2020-03-10'
- and user_id is not null
- group by user_id
- ),
- tmp_cart as
- (
- select
- user_id,
- count(*) cart_count,
- sum(cart_price*sku_num) cart_amount
- from dwd_fact_cart_info
- where dt='2020-03-10'
- and user_id is not null
- and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
- group by user_id
- ),
- tmp_order as
- (
- select
- user_id,
- count(*) order_count,
- sum(final_total_amount) order_amount
- from dwd_fact_order_info
- where dt='2020-03-10'
- group by user_id
- ) ,
- tmp_payment as
- (
- select
- user_id,
- count(*) payment_count,
- sum(payment_amount) payment_amount
- from dwd_fact_payment_info
- where dt='2020-03-10'
- group by user_id
- )
- insert overwrite table dws_user_action_daycount partition(dt='2020-03-10')
- select
- user_actions.user_id,
- sum(user_actions.login_count),
- sum(user_actions.cart_count),
- sum(user_actions.cart_amount),
- sum(user_actions.order_count),
- sum(user_actions.order_amount),
- sum(user_actions.payment_count),
- sum(user_actions.payment_amount)
- from
- (
- select
- user_id,
- login_count,
- 0 cart_count,
- 0 cart_amount,
- 0 order_count,
- 0 order_amount,
- 0 payment_count,
- 0 payment_amount
- from
- tmp_login
- union all
- select
- user_id,
- 0 login_count,
- cart_count,
- cart_amount,
- 0 order_count,
- 0 order_amount,
- 0 payment_count,
- 0 payment_amount
- from
- tmp_cart
- union all
- select
- user_id,
- 0 login_count,
- 0 cart_count,
- 0 cart_amount,
- order_count,
- order_amount,
- 0 payment_count,
- 0 payment_amount
- from tmp_order
- union all
- select
- user_id,
- 0 login_count,
- 0 cart_count,
- 0 cart_amount,
- 0 order_count,
- 0 order_amount,
- payment_count,
- payment_amount
- from tmp_payment
- ) user_actions
- group by user_id;
复制代码
3)查询加载结果
hive (gmall)> select * from dws_user_action_daycount where dt=‘2020-03-10’;
1.4.2 每日商品行为
1)建表语句
- drop table if exists dws_sku_action_daycount;
- create external table dws_sku_action_daycount
- (
- sku_id string comment 'sku_id',
- order_count bigint comment '被下单次数',
- order_num bigint comment '被下单件数',
- order_amount decimal(16,2) comment '被下单金额',
- payment_count bigint comment '被支付次数',
- payment_num bigint comment '被支付件数',
- payment_amount decimal(16,2) comment '被支付金额',
- refund_count bigint comment '被退款次数',
- refund_num bigint comment '被退款件数',
- refund_amount decimal(16,2) comment '被退款金额',
- cart_count bigint comment '被加入购物车次数',
- cart_num bigint comment '被加入购物车件数',
- favor_count bigint comment '被收藏次数',
- appraise_good_count bigint comment '好评数',
- appraise_mid_count bigint comment '中评数',
- appraise_bad_count bigint comment '差评数',
- appraise_default_count bigint comment '默认评价数'
- ) COMMENT '每日商品行为'
- PARTITIONED BY (`dt` string)
- stored as parquet
- location '/warehouse/gmall/dws/dws_sku_action_daycount/'
- tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
注意:如果是 23 点 59 下单,支付日期跨天。需要从订单详情里面取出支付时间是今天,订单时间是昨天或者今天的订单
- with
- tmp_order as
- (
- select
- sku_id,
- count(*) order_count,
- sum(sku_num) order_num,
- sum(total_amount) order_amount
- from dwd_fact_order_detail
- where dt='2020-03-10'
- group by sku_id
- ),
- tmp_payment as
- (
- select
- sku_id,
- count(*) payment_count,
- sum(sku_num) payment_num,
- sum(total_amount) payment_amount
- from dwd_fact_order_detail
- where dt='2020-03-10'
- and order_id in
- (
- select
- id
- from dwd_fact_order_info
- where (dt='2020-03-10' or dt=date_add('2020-03-10',-1))
- and date_format(payment_time,'yyyy-MM-dd')='2020-03-10'
- )
- group by sku_id
- ),
- tmp_refund as
- (
- select
- sku_id,
- count(*) refund_count,
- sum(refund_num) refund_num,
- sum(refund_amount) refund_amount
- from dwd_fact_order_refund_info
- where dt='2020-03-10'
- group by sku_id
- ),
- tmp_cart as
- (
- select
- sku_id,
- count(*) cart_count,
- sum(sku_num) cart_num
- from dwd_fact_cart_info
- where dt='2020-03-10'
- and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
- group by sku_id
- ),
- tmp_favor as
- (
- select
- sku_id,
- count(*) favor_count
- from dwd_fact_favor_info
- where dt='2020-03-10'
- and date_format(create_time,'yyyy-MM-dd')='2020-03-10'
- group by sku_id
- ),
- tmp_appraise as
- (
- select
- sku_id,
- sum(if(appraise='1201',1,0)) appraise_good_count,
- sum(if(appraise='1202',1,0)) appraise_mid_count,
- sum(if(appraise='1203',1,0)) appraise_bad_count,
- sum(if(appraise='1204',1,0)) appraise_default_count
- from dwd_fact_comment_info
- where dt='2020-03-10'
- group by sku_id
- )
- insert overwrite table dws_sku_action_daycount partition(dt='2020-03-10')
- select
- sku_id,
- sum(order_count),
- sum(order_num),
- sum(order_amount),
- sum(payment_count),
- sum(payment_num),
- sum(payment_amount),
- sum(refund_count),
- sum(refund_num),
- sum(refund_amount),
- sum(cart_count),
- sum(cart_num),
- sum(favor_count),
- sum(appraise_good_count),
- sum(appraise_mid_count),
- sum(appraise_bad_count),
- sum(appraise_default_count)
- from
- (
- select
- sku_id,
- order_count,
- order_num,
- order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_order
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- payment_count,
- payment_num,
- payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_payment
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- refund_count,
- refund_num,
- refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_refund
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- cart_count,
- cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_cart
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_favor
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- appraise_good_count,
- appraise_mid_count,
- appraise_bad_count,
- appraise_default_count
- from tmp_appraise
- )tmp
- group by sku_id;
复制代码
3)查询加载结果
- select * from dws_sku_action_daycount where dt='2020-03-10';
复制代码
1.4.5 每日购买行为
1)建表语句
- drop table if exists dws_sale_detail_daycount;
- create external table dws_sale_detail_daycount
- (
- user_id string comment '用户 id',
- sku_id string comment '商品 id',
- user_gender string comment '用户性别',
- user_age string comment '用户年龄',
- user_level string comment '用户等级',
- order_price decimal(10,2) comment '商品价格',
- sku_name string comment '商品名称',
- sku_tm_id string comment '品牌 id',
- sku_category3_id string comment '商品三级品类 id',
- sku_category2_id string comment '商品二级品类 id',
- sku_category1_id string comment '商品一级品类 id',
- sku_category3_name string comment '商品三级品类名称',
- sku_category2_name string comment '商品二级品类名称',
- sku_category1_name string comment '商品一级品类名称',
- spu_id string comment '商品 spu',
- sku_num int comment '购买个数',
- order_count bigint comment '当日下单单数',
- order_amount decimal(16,2) comment '当日下单金额'
- ) COMMENT '每日购买行为'
- PARTITIONED BY (`dt` string)
- stored as parquet
- location '/warehouse/gmall/dws/dws_sale_detail_daycount/'
- tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
- insert overwrite table dws_sale_detail_daycount partition(dt='2020-03-10')
- select
- op.user_id,
- op.sku_id,
- ui.gender,
- months_between('2020-03-10', ui.birthday)/12 age,
- ui.user_level,
- si.price,
- si.sku_name,
- si.tm_id,
- si.category3_id,
- si.category2_id,
- si.category1_id,
- si.category3_name,
- si.category2_name,
- si.category1_name,
- si.spu_id,
- op.sku_num,
- op.order_count,
- op.order_amount
- from
- (
- select
- user_id,
- sku_id,
- sum(sku_num) sku_num,
- count(*) order_count,
- sum(total_amount) order_amount
- from dwd_fact_order_detail
- where dt='2020-03-10'
- group by user_id, sku_id
- )op
- join
- (
- select
- *
- from dwd_dim_user_info_his
- where end_date='9999-99-99'
- )ui on op.user_id = ui.id
- join
- (
- select
- *
- from dwd_dim_sku_info
- where dt='2020-03-10'
- )si on op.sku_id = si.id;
复制代码
3)查询加载结果
- select * from dws_sale_detail_daycount where dt='2020-03-10';
复制代码
1.5 DWS 层数据导入脚本
1)vim dwd_to_dws.sh
在脚本中填写如下内容
- #!/bin/bash
- APP=gmall
- hive=/opt/modules/hive/bin/hive
- # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
- if [ -n "$1" ] ;then
- do_date=$1
- else
- do_date=`date -d "-1 day" +%F`
- fi
- sql="
- insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
- select
- mid_id,
- concat_ws('|', collect_set(user_id)) user_id,
- concat_ws('|', collect_set(version_code)) version_code,
- concat_ws('|', collect_set(version_name)) version_name,
- concat_ws('|', collect_set(lang))lang,
- concat_ws('|', collect_set(source)) source,
- concat_ws('|', collect_set(os)) os,
- concat_ws('|', collect_set(area)) area,
- concat_ws('|', collect_set(model)) model,
- concat_ws('|', collect_set(brand)) brand,
- concat_ws('|', collect_set(sdk_version)) sdk_version,
- concat_ws('|', collect_set(gmail)) gmail,
- concat_ws('|', collect_set(height_width)) height_width,
- concat_ws('|', collect_set(app_time)) app_time,
- concat_ws('|', collect_set(network)) network,
- concat_ws('|', collect_set(lng)) lng,
- concat_ws('|', collect_set(lat)) lat,
- count(*) login_count
- from ${APP}.dwd_start_log
- where dt='$do_date'
- group by mid_id;
- with
- tmp_login as
- (
- select
- user_id,
- count(*) login_count
- from ${APP}.dwd_start_log
- where dt='$do_date'
- and user_id is not null
- group by user_id
- ),
- tmp_cart as
- (
- select
- user_id,
- count(*) cart_count,
- sum(cart_price*sku_num) cart_amount
- from ${APP}.dwd_fact_cart_info
- where dt='$do_date'
- and user_id is not null
- and date_format(create_time,'yyyy-MM-dd')='$do_date'
- group by user_id
- ),
- tmp_order as
- (
- select
- user_id,
- count(*) order_count,
- sum(final_total_amount) order_amount
- from ${APP}.dwd_fact_order_info
- where dt='$do_date'
- group by user_id
- ) ,
- tmp_payment as
- (
- select
- user_id,
- count(*) payment_count,
- sum(payment_amount) payment_amount
- from ${APP}.dwd_fact_payment_info
- where dt='$do_date'
- group by user_id
- )
-
-
- insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
- select
- user_actions.user_id,
- sum(user_actions.login_count),
- sum(user_actions.cart_count),
- sum(user_actions.cart_amount),
- sum(user_actions.order_count),
- sum(user_actions.order_amount),
- sum(user_actions.payment_count),
- sum(user_actions.payment_amount)
- from
- (
- select
- user_id,
- login_count,
- 0 cart_count,
- 0 cart_amount,
- 0 order_count,
- 0 order_amount,
- 0 payment_count,
- 0 payment_amount
- from
- tmp_login
- union all
- select
- user_id,
- 0 login_count,
- cart_count,
- cart_amount,
- 0 order_count,
- 0 order_amount,
- 0 payment_count,
- 0 payment_amount
- from
- tmp_cart
- union all
- select
- user_id,
- 0 login_count,
- 0 cart_count,
- 0 cart_amount,
- order_count,
- order_amount,
- 0 payment_count,
- 0 payment_amount
- from tmp_order
- union all
- select
- user_id,
- 0 login_count,
- 0 cart_count,
- 0 cart_amount,
- 0 order_count,
- 0 order_amount,
- payment_count,
- payment_amount
- from tmp_payment
- ) user_actions
- group by user_id;
- with
- tmp_order as
- (
- select
- sku_id,
- count(*) order_count,
- sum(sku_num) order_num,
- sum(total_amount) order_amount
- from ${APP}.dwd_fact_order_detail
- where dt='$do_date'
- group by sku_id
- ),
- tmp_payment as
- (
- select
- sku_id,
- count(*) payment_count,
- sum(sku_num) payment_num,
- sum(total_amount) payment_amount
- from ${APP}.dwd_fact_order_detail
- where dt='$do_date'
- and order_id in
- (
- select
- id
- from ${APP}.dwd_fact_order_info
- where (dt='$do_date' or dt=date_add('$do_date',-1))
- and date_format(payment_time,'yyyy-MM-dd')='$do_date'
- )
- group by sku_id
- ),
- tmp_refund as
- (
- select
- sku_id,
- count(*) refund_count,
- sum(refund_num) refund_num,
- sum(refund_amount) refund_amount
- from ${APP}.dwd_fact_order_refund_info
- where dt='$do_date'
- group by sku_id
- ),
- tmp_cart as
- (
- select
- sku_id,
- count(*) cart_count,
- sum(sku_num) cart_num
- from ${APP}.dwd_fact_cart_info
- where dt='$do_date'
- and date_format(create_time,'yyyy-MM-dd')='$do_date'
- group by sku_id
- ),
- tmp_favor as
- (
- select
- sku_id,
- count(*) favor_count
- from ${APP}.dwd_fact_favor_info
- where dt='$do_date'
- and date_format(create_time,'yyyy-MM-dd')='$do_date'
- group by sku_id
- ),
- tmp_appraise as
- (
- select
- sku_id,
- sum(if(appraise='1201',1,0)) appraise_good_count,
- sum(if(appraise='1202',1,0)) appraise_mid_count,
- sum(if(appraise='1203',1,0)) appraise_bad_count,
- sum(if(appraise='1204',1,0)) appraise_default_count
- from ${APP}.dwd_fact_comment_info
- where dt='$do_date'
- group by sku_id
- )
-
-
- insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
- select
- sku_id,
- sum(order_count),
- sum(order_num),
- sum(order_amount),
- sum(payment_count),
- sum(payment_num),
- sum(payment_amount),
- sum(refund_count),
- sum(refund_num),
- sum(refund_amount),
- sum(cart_count),
- sum(cart_num),
- sum(favor_count),
- sum(appraise_good_count),
- sum(appraise_mid_count),
- sum(appraise_bad_count),
- sum(appraise_default_count)
- from
- (
- select
- sku_id,
- order_count,
- order_num,
- order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_order
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- payment_count,
- payment_num,
- payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_payment
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- refund_count,
- refund_num,
- refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_refund
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- cart_count,
- cart_num,
- 0 favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_cart
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- favor_count,
- 0 appraise_good_count,
- 0 appraise_mid_count,
- 0 appraise_bad_count,
- 0 appraise_default_count
- from tmp_favor
- union all
- select
- sku_id,
- 0 order_count,
- 0 order_num,
- 0 order_amount,
- 0 payment_count,
- 0 payment_num,
- 0 payment_amount,
- 0 refund_count,
- 0 refund_num,
- 0 refund_amount,
- 0 cart_count,
- 0 cart_num,
- 0 favor_count,
- appraise_good_count,
- appraise_mid_count,
- appraise_bad_count,
- appraise_default_count
- from tmp_appraise
- )tmp
- group by sku_id;
-
-
- insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
- select
- op.user_id,
- op.sku_id,
- ui.gender,
- months_between('$do_date', ui.birthday)/12 age,
- ui.user_level,
- si.price,
- si.sku_name,
- si.tm_id,
- si.category3_id,
- si.category2_id,
- si.category1_id,
- si.category3_name,
- si.category2_name,
- si.category1_name,
- si.spu_id,
- op.sku_num,
- op.order_count,
- op.order_amount
- from
- (
- select
- user_id,
- sku_id,
- sum(sku_num) sku_num,
- count(*) order_count,
- sum(total_amount) order_amount
- from ${APP}.dwd_fact_order_detail
- where dt='$do_date'
- group by user_id, sku_id
- )op
- join
- (
- select
- *
- from ${APP}.dwd_dim_user_info_his
- where end_date='9999-99-99'
- )ui on op.user_id = ui.id
- join
- (
- select
- *
- from ${APP}.dwd_dim_sku_info
- where dt='$do_date'
- )si on op.sku_id = si.id;
- "
- $hive -e "$sql"
复制代码
2)增加脚本执行权限
复制代码
3)执行脚本导入数据
复制代码 4)查看导入数据
- select * from dws_uv_detail_daycount where dt='2020-03-11';
- select * from dws_user_action_daycount where dt='2020-03-11';
- select * from dws_sku_action_daycount where dt='2020-03-11';
- select * from dws_sale_detail_daycount where dt='2020-03-11';
复制代码
二、数仓搭建-DWT 层
2.1 设备主题宽表
1)建表语句
- drop table if exists dwt_uv_topic;
- create external table dwt_uv_topic
- (
- `mid_id` string COMMENT '设备唯一标识',
- `user_id` string COMMENT '用户标识',
- `version_code` string COMMENT '程序版本号',
- `version_name` string COMMENT '程序版本名',
- `lang` string COMMENT '系统语言',
- `source` string COMMENT '渠道号',
- `os` string COMMENT '安卓系统版本',
- `area` string COMMENT '区域',
- `model` string COMMENT '手机型号',
- `brand` string COMMENT '手机品牌',
- `sdk_version` string COMMENT 'sdkVersion',
- `gmail` string COMMENT 'gmail',
- `height_width` string COMMENT '屏幕宽高',
- `app_time` string COMMENT '客户端日志产生时的时间',
- `network` string COMMENT '网络模式',
- `lng` string COMMENT '经度',
- `lat` string COMMENT '纬度',
- `login_date_first` string comment '首次活跃时间',
- `login_date_last` string comment '末次活跃时间',
- `login_day_count` bigint comment '当日活跃次数',
- `login_count` bigint comment '累积活跃天数'
- )
- stored as parquet
- location '/warehouse/gmall/dwt/dwt_uv_topic';
复制代码
2)数据装载
- insert overwrite table dwt_uv_topic
- select
- nvl(new.mid_id,old.mid_id),
- nvl(new.user_id,old.user_id),
- nvl(new.version_code,old.version_code),
- nvl(new.version_name,old.version_name),
- nvl(new.lang,old.lang),
- nvl(new.source,old.source),
- nvl(new.os,old.os),
- nvl(new.area,old.area),
- nvl(new.model,old.model),
- nvl(new.brand,old.brand),
- nvl(new.sdk_version,old.sdk_version),
- nvl(new.gmail,old.gmail),
- nvl(new.height_width,old.height_width),
- nvl(new.app_time,old.app_time),
- nvl(new.network,old.network),
- nvl(new.lng,old.lng),
- nvl(new.lat,old.lat),
- if(old.mid_id is null,'2020-03-10',old.login_date_first),
- if(new.mid_id is not null,'2020-03-10',old.login_date_last),
- if(new.mid_id is not null, new.login_count,0),
- nvl(old.login_count,0)+if(new.login_count>0,1,0)
- from
- (
- select
- *
- from dwt_uv_topic
- )old
- full outer join
- (
- select
- *
- from dws_uv_detail_daycount
- where dt='2020-03-10'
- )new
- on old.mid_id=new.mid_id;
复制代码
3)查询加载结果
- select * from dwt_uv_topic limit 5;
复制代码
2.2 会员主题宽表
宽表字段怎么来?维度关联的事实表度量值+开头、结尾+累积+累积一个时间段
1)建表语句
- drop table if exists dwt_user_topic;
- create external table dwt_user_topic
- (
- user_id string comment '用户 id',
- login_date_first string comment '首次登录时间',
- login_date_last string comment '末次登录时间',
- login_count bigint comment '累积登录天数',
- login_last_30d_count bigint comment '最近 30 日登录天数',
- order_date_first string comment '首次下单时间',
- order_date_last string comment '末次下单时间',
- order_count bigint comment '累积下单次数',
- order_amount decimal(16,2) comment '累积下单金额',
- order_last_30d_count bigint comment '最近 30 日下单次数',
- order_last_30d_amount bigint comment '最近 30 日下单金额',
- payment_date_first string comment '首次支付时间',
- payment_date_last string comment '末次支付时间',
- payment_count decimal(16,2) comment '累积支付次数',
- payment_amount decimal(16,2) comment '累积支付金额',
- payment_last_30d_count decimal(16,2) comment '最近 30 日支付次数',
- payment_last_30d_amount decimal(16,2) comment '最近 30 日支付金额'
- )COMMENT '用户主题宽表'
- stored as parquet
- location '/warehouse/gmall/dwt/dwt_user_topic/'
- tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
- insert overwrite table dwt_user_topic
- select
- nvl(new.user_id,old.user_id),
- if(old.login_date_first is null and
- new.login_count>0,'2020-03-10',old.login_date_first),
- if(new.login_count>0,'2020-03-10',old.login_date_last),
- nvl(old.login_count,0)+if(new.login_count>0,1,0),
- nvl(new.login_last_30d_count,0),
- if(old.order_date_first is null and
- new.order_count>0,'2020-03-10',old.order_date_first),
- if(new.order_count>0,'2020-03-10',old.order_date_last),
- nvl(old.order_count,0)+nvl(new.order_count,0),
- nvl(old.order_amount,0)+nvl(new.order_amount,0),
- nvl(new.order_last_30d_count,0),
- nvl(new.order_last_30d_amount,0),
- if(old.payment_date_first is null and
- new.payment_count>0,'2020-03-10',old.payment_date_first),
- if(new.payment_count>0,'2020-03-10',old.payment_date_last),
- nvl(old.payment_count,0)+nvl(new.payment_count,0),
- nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
- nvl(new.payment_last_30d_count,0),
- nvl(new.payment_last_30d_amount,0)
- from
- dwt_user_topic old
- full outer join
- (
- select
- user_id,
- sum(if(dt='2020-03-10',login_count,0)) login_count,
- sum(if(dt='2020-03-10',order_count,0)) order_count,
- sum(if(dt='2020-03-10',order_amount,0)) order_amount,
- sum(if(dt='2020-03-10',payment_count,0)) payment_count,
- sum(if(dt='2020-03-10',payment_amount,0)) payment_amount,
- sum(if(login_count>0,1,0)) login_last_30d_count,
- sum(order_count) order_last_30d_count,
- sum(order_amount) order_last_30d_amount,
- sum(payment_count) payment_last_30d_count,
- sum(payment_amount) payment_last_30d_amount
- from dws_user_action_daycount
- where dt>=date_add( '2020-03-10',-30)
- group by user_id
- )new
- on old.user_id=new.user_id;
复制代码
3)查询加载结果
- select * from dwt_user_topic limit 5;
复制代码
2.3 商品主题宽表
1)建表语句
- create external table dwt_sku_topic
- (
- sku_id string comment 'sku_id',
- spu_id string comment 'spu_id',
- order_last_30d_count bigint comment '最近 30 日被下单次数',
- order_last_30d_num bigint comment '最近 30 日被下单件数',
- order_last_30d_amount decimal(16,2) comment '最近 30 日被下单金额',
- order_count bigint comment '累积被下单次数',
- order_num bigint comment '累积被下单件数',
- order_amount decimal(16,2) comment '累积被下单金额',
- payment_last_30d_count bigint comment '最近 30 日被支付次数',
- payment_last_30d_num bigint comment '最近 30 日被支付件数',
- payment_last_30d_amount decimal(16,2) comment '最近 30 日被支付金额',
- payment_count bigint comment '累积被支付次数',
- payment_num bigint comment '累积被支付件数',
- payment_amount decimal(16,2) comment '累积被支付金额',
- refund_last_30d_count bigint comment '最近三十日退款次数',
- refund_last_30d_num bigint comment '最近三十日退款件数',
- refund_last_30d_amount decimal(10,2) comment '最近三十日退款金额',
- refund_count bigint comment '累积退款次数',
- refund_num bigint comment '累积退款件数',
- refund_amount decimal(10,2) comment '累积退款金额',
- cart_last_30d_count bigint comment '最近 30 日被加入购物车次数',
- cart_last_30d_num bigint comment '最近 30 日被加入购物车件数',
- cart_count bigint comment '累积被加入购物车次数',
- cart_num bigint comment '累积被加入购物车件数',
- favor_last_30d_count bigint comment '最近 30 日被收藏次数',
- favor_count bigint comment '累积被收藏次数',
- appraise_last_30d_good_count bigint comment '最近 30 日好评数',
- appraise_last_30d_mid_count bigint comment '最近 30 日中评数',
- appraise_last_30d_bad_count bigint comment '最近 30 日差评数',
- appraise_last_30d_default_count bigint comment '最近 30 日默认评价数',
- appraise_good_count bigint comment '累积好评数',
- appraise_mid_count bigint comment '累积中评数',
- appraise_bad_count bigint comment '累积差评数',
- appraise_default_count bigint comment '累积默认评价数'
- )COMMENT '商品主题宽表'
- stored as parquet
- location '/warehouse/gmall/dwt/dwt_sku_topic/'
- tblproperties ("parquet.compression"="lzo");
复制代码
2)数据装载
- insert overwrite table dwt_sku_topic
- select
- nvl(new.sku_id,old.sku_id), sku_info.spu_id,
- nvl(new.order_count30,0),
- nvl(new.order_num30,0),
- nvl(new.order_amount30,0),
- nvl(old.order_count,0) + nvl(new.order_count,0),
- nvl(old.order_num,0) + nvl(new.order_num,0),
- nvl(old.order_amount,0) + nvl(new.order_amount,0),
- nvl(new.payment_count30,0),
- nvl(new.payment_num30,0),
- nvl(new.payment_amount30,0),
- nvl(old.payment_count,0) + nvl(new.payment_count,0),
- nvl(old.payment_num,0) + nvl(new.payment_count,0),
- nvl(old.payment_amount,0) + nvl(new.payment_count,0),
- nvl(new.refund_count30,0),
- nvl(new.refund_num30,0),
- nvl(new.refund_amount30,0),
- nvl(old.refund_count,0) + nvl(new.refund_count,0),
- nvl(old.refund_num,0) + nvl(new.refund_num,0),
- nvl(old.refund_amount,0) + nvl(new.refund_amount,0),
- nvl(new.cart_count30,0),
- nvl(new.cart_num30,0),
- nvl(old.cart_count,0) + nvl(new.cart_count,0),
- nvl(old.cart_num,0) + nvl(new.cart_num,0),
- nvl(new.favor_count30,0),
- nvl(old.favor_count,0) + nvl(new.favor_count,0),
- nvl(new.appraise_good_count30,0),
- nvl(new.appraise_mid_count30,0),
- nvl(new.appraise_bad_count30,0),
- nvl(new.appraise_default_count30,0) ,
- nvl(old.appraise_good_count,0) + nvl(new.appraise_good_count,0),
- nvl(old.appraise_mid_count,0) + nvl(new.appraise_mid_count,0),
- nvl(old.appraise_bad_count,0) + nvl(new.appraise_bad_count,0),
- nvl(old.appraise_default_count,0) + nvl(new.appraise_default_count,0)
- from
- (
- select
- sku_id,
- spu_id,
- order_last_30d_count,
- order_last_30d_num,
- order_last_30d_amount,
- order_count,
- order_num,
- order_amount ,
- payment_last_30d_count,
- payment_last_30d_num,
- payment_last_30d_amount,
- payment_count,
- payment_num,
- payment_amount,
- refund_last_30d_count,
- refund_last_30d_num,
- refund_last_30d_amount,
- refund_count,
- refund_num,
- refund_amount,
- cart_last_30d_count,
- cart_last_30d_num,
- cart_count,
- cart_num,
- favor_last_30d_count,
- favor_count,
- appraise_last_30d_good_count,
- appraise_last_30d_mid_count,
- appraise_last_30d_bad_count,
- appraise_last_30d_default_count,
- appraise_good_count,
- appraise_mid_count,
- appraise_bad_count,
- appraise_default_count
- from dwt_sku_topic
- )old
- full outer join
- (
- select
- sku_id,
- sum(if(dt='2020-03-10', order_count,0 )) order_count,
- sum(if(dt='2020-03-10',order_num ,0 )) order_num,
- sum(if(dt='2020-03-10',order_amount,0 )) order_amount ,
- sum(if(dt='2020-03-10',payment_count,0 )) payment_count,
- sum(if(dt='2020-03-10',payment_num,0 )) payment_num,
- sum(if(dt='2020-03-10',payment_amount,0 )) payment_amount,
- sum(if(dt='2020-03-10',refund_count,0 )) refund_count,
- sum(if(dt='2020-03-10',refund_num,0 )) refund_num,
- sum(if(dt='2020-03-10',refund_amount,0 )) refund_amount,
- sum(if(dt='2020-03-10',cart_count,0 )) cart_count,
- sum(if(dt='2020-03-10',cart_num,0 )) cart_num,
- sum(if(dt='2020-03-10',favor_count,0 )) favor_count,
- sum(if(dt='2020-03-10',appraise_good_count,0 )) appraise_good_count,
- sum(if(dt='2020-03-10',appraise_mid_count,0 ) ) appraise_mid_count ,
- sum(if(dt='2020-03-10',appraise_bad_count,0 )) appraise_bad_count,
- sum(if(dt='2020-03-10',appraise_default_count,0 )) appraise_default_count,
- sum(order_count) order_count30 ,
- sum(order_num) order_num30,
- sum(order_amount) order_amount30,
- sum(payment_count) payment_count30,
- sum(payment_num) payment_num30,
- sum(payment_amount) payment_amount30,
- sum(refund_count) refund_count30,
- sum(refund_num) refund_num30,
- sum(refund_amount) refund_amount30,
- sum(cart_count) cart_count30,
- sum(cart_num) cart_num30,
- sum(favor_count) favor_count30,
- sum(appraise_good_count) appraise_good_count30,
- sum(appraise_mid_count) appraise_mid_count30,
- sum(appraise_bad_count) appraise_bad_count30,
- sum(appraise_default_count) appraise_default_count30
- from dws_sku_action_daycount
- where dt >= date_add ('2020-03-10', -30)
- group by sku_id
- )new
- on new.sku_id = old.sku_id
- left join
- (select * from dwd_dim_sku_info where dt='2020-03-10') sku_info
- on nvl(new.sku_id,old.sku_id)= sku_info.id;
复制代码
3)查询加载结果
- select * from dwt_sku_topic limit 5;
复制代码
2.4 DWT 层数据导入脚本
1)vim dws_to_dwt.sh
在脚本中填写如下内容
- #!/bin/bash
- APP=gmall
- hive=/opt/modules/hive/bin/hive
- # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
- if [ -n "$1" ] ;then
- do_date=$1
- else
- do_date=`date -d "-1 day" +%F`
- fi
- sql="
- insert overwrite table ${APP}.dwt_uv_topic
- select
- nvl(new.mid_id,old.mid_id),
- nvl(new.user_id,old.user_id),
- nvl(new.version_code,old.version_code),
- nvl(new.version_name,old.version_name),
- nvl(new.lang,old.lang),
- nvl(new.source,old.source),
- nvl(new.os,old.os),
- nvl(new.area,old.area),
- nvl(new.model,old.model),
- nvl(new.brand,old.brand),
- nvl(new.sdk_version,old.sdk_version),
- nvl(new.gmail,old.gmail),
- nvl(new.height_width,old.height_width),
- nvl(new.app_time,old.app_time),
- nvl(new.network,old.network),
- nvl(new.lng,old.lng),
- nvl(new.lat,old.lat),
- nvl(old.login_date_first,'$do_date'),
- if(new.login_count>0,'$do_date',old.login_date_last),
- nvl(new.login_count,0),
- nvl(new.login_count,0)+nvl(old.login_count,0)
- from
- (
- select
- *
- from ${APP}.dwt_uv_topic
- )old
- full outer join
- (
- select
- *
- from ${APP}.dws_uv_detail_daycount
- where dt='$do_date'
- )new
- on old.mid_id=new.mid_id;
-
-
- insert overwrite table ${APP}.dwt_user_topic
- select
- nvl(new.user_id,old.user_id),
- if(old.login_date_first is null and
- new.login_count>0,'$do_date',old.login_date_first),
- if(new.login_count>0,'$do_date',old.login_date_last),
- nvl(old.login_count,0)+if(new.login_count>0,1,0),
- nvl(new.login_last_30d_count,0),
- if(old.order_date_first is null and
- new.order_count>0,'$do_date',old.order_date_first),
- if(new.order_count>0,'$do_date',old.order_date_last),
- nvl(old.order_count,0)+nvl(new.order_count,0),
- nvl(old.order_amount,0)+nvl(new.order_amount,0),
- nvl(new.order_last_30d_count,0),
- nvl(new.order_last_30d_amount,0),
- if(old.payment_date_first is null and
- new.payment_count>0,'$do_date',old.payment_date_first),
- if(new.payment_count>0,'$do_date',old.payment_date_last),
- nvl(old.payment_count,0)+nvl(new.payment_count,0),
- nvl(old.payment_amount,0)+nvl(new.payment_amount,0),
- nvl(new.payment_last_30d_count,0),
- nvl(new.payment_last_30d_amount,0)
- from
- (
- select
- *
- from ${APP}.dwt_user_topic
- )old
- full outer join
- (
- select
- user_id,
- sum(if(dt='$do_date',login_count,0)) login_count,
- sum(if(dt='$do_date',order_count,0)) order_count,
- sum(if(dt='$do_date',order_amount,0)) order_amount,
- sum(if(dt='$do_date',payment_count,0)) payment_count,
- sum(if(dt='$do_date',payment_amount,0)) payment_amount,
- sum(if(order_count>0,1,0)) login_last_30d_count,
- sum(order_count) order_last_30d_count,
- sum(order_amount) order_last_30d_amount,
- sum(payment_count) payment_last_30d_count,
- sum(payment_amount) payment_last_30d_amount
- from ${APP}.dws_user_action_daycount
- where dt>=date_add( '$do_date',-30)
- group by user_id
- )new
- on old.user_id=new.user_id;
- with
- sku_act as
- (
- select
- sku_id,
- sum(if(dt='$do_date', order_count,0 )) order_count,
- sum(if(dt='$do_date',order_num ,0 )) order_num,
- sum(if(dt='$do_date',order_amount,0 )) order_amount ,
- sum(if(dt='$do_date',payment_count,0 )) payment_count,
- sum(if(dt='$do_date',payment_num,0 )) payment_num,
- sum(if(dt='$do_date',payment_amount,0 )) payment_amount,
- sum(if(dt='$do_date',refund_count,0 )) refund_count,
- sum(if(dt='$do_date',refund_num,0 )) refund_num,
- sum(if(dt='$do_date',refund_amount,0 )) refund_amount,
- sum(if(dt='$do_date',cart_count,0 )) cart_count,
- sum(if(dt='$do_date',cart_num,0 )) cart_num,
- sum(if(dt='$do_date',favor_count,0 )) favor_count,
- sum(if(dt='$do_date',appraise_good_count,0 )) appraise_good_count,
- sum(if(dt='$do_date',appraise_mid_count,0 ) ) appraise_mid_count ,
- sum(if(dt='$do_date',appraise_bad_count,0 )) appraise_bad_count,
- sum(if(dt='$do_date',appraise_default_count,0 )) appraise_default_count,
- sum( order_count ) order_count30 ,
- sum( order_num ) order_num30,
- sum(order_amount ) order_amount30,
- sum(payment_count ) payment_count30,
- sum(payment_num ) payment_num30,
- sum(payment_amount ) payment_amount30,
- sum(refund_count ) refund_count30,
- sum(refund_num ) refund_num30,
- sum(refund_amount ) refund_amount30,
- sum(cart_count ) cart_count30,
- sum(cart_num ) cart_num30,
- sum(favor_count ) favor_count30,
- sum(appraise_good_count ) appraise_good_count30,
- sum(appraise_mid_count ) appraise_mid_count30,
- sum(appraise_bad_count ) appraise_bad_count30,
- sum(appraise_default_count ) appraise_default_count30
- from ${APP}.dws_sku_action_daycount
- where dt>=date_add ( '$do_date',-30)
- group by sku_id
- ),
- sku_topic
- as
- (
- select
- sku_id,
- spu_id,
- order_last_30d_count,
- order_last_30d_num,
- order_last_30d_amount,
- order_count,
- order_num,
- order_amount ,
- payment_last_30d_count,
- payment_last_30d_num,
- payment_last_30d_amount,
- payment_count,
- payment_num,
- payment_amount,
- refund_last_30d_count,
- refund_last_30d_num,
- refund_last_30d_amount ,
- refund_count ,
- refund_num ,
- refund_amount ,
- cart_last_30d_count ,
- cart_last_30d_num ,
- cart_count ,
- cart_num ,
- favor_last_30d_count ,
- favor_count ,
- appraise_last_30d_good_count ,
- appraise_last_30d_mid_count ,
- appraise_last_30d_bad_count ,
- appraise_last_30d_default_count ,
- appraise_good_count ,
- appraise_mid_count ,
- appraise_bad_count ,
- appraise_default_count
- from ${APP}.dwt_sku_topic
- )
-
-
- insert overwrite table ${APP}.dwt_sku_topic
- select
- nvl(sku_act.sku_id,sku_topic.sku_id) ,
- sku_info.spu_id,
- nvl (sku_act.order_count30,0) ,
- nvl (sku_act.order_num30,0) ,
- nvl (sku_act.order_amount30,0) ,
- nvl(sku_topic.order_count,0)+ nvl (sku_act.order_count,0) ,
- nvl(sku_topic.order_num,0)+ nvl (sku_act.order_num,0) ,
- nvl(sku_topic.order_amount,0)+ nvl (sku_act.order_amount,0),
- nvl (sku_act.payment_count30,0),
- nvl (sku_act.payment_num30,0),
- nvl (sku_act.payment_amount30,0),
- nvl(sku_topic.payment_count,0)+ nvl (sku_act.payment_count,0) ,
- nvl(sku_topic.payment_num,0)+ nvl (sku_act.payment_count,0) ,
- nvl(sku_topic.payment_amount,0)+ nvl (sku_act.payment_count,0) ,
- nvl (refund_count30,0),
- nvl (sku_act.refund_num30,0),
- nvl (sku_act.refund_amount30,0),
- nvl(sku_topic.refund_count,0)+ nvl (sku_act.refund_count,0),
- nvl(sku_topic.refund_num,0)+ nvl (sku_act.refund_num,0),
- nvl(sku_topic.refund_amount,0)+ nvl (sku_act.refund_amount,0),
- nvl(sku_act.cart_count30,0) ,
- nvl(sku_act.cart_num30,0) ,
- nvl(sku_topic.cart_count ,0)+ nvl (sku_act.cart_count,0),
- nvl( sku_topic.cart_num ,0)+ nvl (sku_act.cart_num,0),
- nvl(sku_act.favor_count30 ,0) ,
- nvl (sku_topic.favor_count ,0)+ nvl (sku_act.favor_count,0),
- nvl (sku_act.appraise_good_count30 ,0) ,
- nvl (sku_act.appraise_mid_count30 ,0) ,
- nvl (sku_act.appraise_bad_count30 ,0) ,
- nvl (sku_act.appraise_default_count30 ,0) ,
- nvl (sku_topic.appraise_good_count ,0)+ nvl
- (sku_act.appraise_good_count,0) ,
- nvl (sku_topic.appraise_mid_count ,0)+ nvl
- (sku_act.appraise_mid_count,0) ,
- nvl (sku_topic.appraise_bad_count ,0)+ nvl
- (sku_act.appraise_bad_count,0) ,
- nvl (sku_topic.appraise_default_count ,0)+ nvl
- (sku_act.appraise_default_count,0)
- from sku_act
- full outer join sku_topic
- on sku_act.sku_id =sku_topic.sku_id
- left join
- (select * from ${APP}.dwd_dim_sku_info where dt='$do_date') sku_info
- on nvl(sku_topic.sku_id,sku_act.sku_id)= sku_info.id;
- "
- $hive -e "$sql"
复制代码
2)增加脚本执行权限
复制代码
3)执行脚本导入数据
复制代码
4)查看导入数据
- select * from dwt_uv_topic limit 5;
- select * from dwt_user_topic limit 5;
- select * from dwt_sku_topic limit 5;
复制代码
获取更多资源:
领取100本书+1T资源
http://www.aboutyun.com/forum.php?mod=viewthread&tid=26480
大数据5个项目视频
http://www.aboutyun.com/forum.php?mod=viewthread&tid=25235
名企资源、名企面试题、最新BAT面试题、专题面试题等资源汇总
https://www.aboutyun.com/forum.php?mod=viewthread&tid=27732
名企资源、名企面试题、最新BAT面试题、专题面试题等资源汇总
https://www.aboutyun.com/forum.php?mod=viewthread&tid=27732
原文链接
https://blog.csdn.net/qq_43733123/article/details/105943032
|