问题:
求每个app下,一天内截止到当前时间点的累计访问量
分析:
sum(pv) over(distribute by apptypeid sort by dateline asc) 根据app分组,时间点正序,得到的就是截止到当前时间点的累加访问量,即累计访问量
案例:
spark-sql> with test1 as
> (select 100024 as apptypeid,'00:00' as dateline,10 as pv
> union all
> select 100024 as apptypeid,'02:00' as dateline,10 as pv
> union all
> select 100024 as apptypeid,'04:00' as dateline,10 as pv
> union all
> select 100024 as apptypeid,'06:00' as dateline,15 as pv
> union all
> select 100024 as apptypeid,'08:00' as dateline,12 as pv
> union all
> select 100024 as apptypeid,'10:00' as dateline,10 as pv
> union all
> select 100024 as apptypeid,'12:00' as dateline,15 as pv
> union all
> select 100024 as apptypeid,'14:00' as dateline,12 as pv
> union all
> select 100024 as apptypeid,'16:00' as dateline,10 as pv
> union all
> select 100024 as apptypeid,'18:00' as dateline,15 as pv
> union all
> select 100024 as apptypeid,'20:00' as dateline,12 as pv
> union all
> select 100024 as apptypeid,'22:00' as dateline,10 as pv
> union all
> select 100024 as apptypeid,'24:00' as dateline,15 as pv)
> select
> apptypeid,
> dateline,
> pv,
> sum(pv) over(distribute by apptypeid sort by dateline asc) as total_pv
> from test1;
apptypeid dateline pv total_pv
100024 00:00 10 10
100024 02:00 10 20
100024 04:00 10 30
100024 06:00 15 45
100024 08:00 12 57
100024 10:00 10 67
100024 12:00 15 82
100024 14:00 12 94
100024 16:00 10 104
100024 18:00 15 119
100024 20:00 12 131
100024 22:00 10 141
100024 24:00 15 156
Time taken: 22.881 seconds, Fetched 13 row(s)
总结: 分析函数sum() over()应用于数据累计的应用场景 例如:员工每个月的累计收入情况、公司每个月的累计营业额等等
|