[mw_shl_code=sql,true]select
t1.id,1+2+t1.value as v
from t1 join t2
where
t1.id = t2.id AND
t2.id < 1000[/mw_shl_code]
首先,sql表达到逻辑执行计划,select操作对应Project,join对应join,where对应fliter,该逻辑计划如下图:
比如下面一条sql,表达含义就是先进行wordcount操作,得到临时表<列Row(word,cnt)>,然后统计频次cnt出现的次数。
[mw_shl_code=sql,true]
SELECT ent, COUNT(cnt) as freq
FROM (
SELECT word, COUNT。)as ent
FROM words
GROUP BY word
)
GROUP BY ent[/mw_shl_code]
数据源先后输入的单词: hello,word ,hello。
6.4 top n策略优化
实时计算中对数据全局排序代价是非常大的,但是计算top n还是比较容易实现的。
下面是计算每个类别,top n的策略,flink sql表达:
[mw_shl_code=bash,true]SELECT *
FROM (
SELECT // you can get like shopld or other information from this
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rowNum
FROM shop_sales
)
WHERE rowNum <= 3[/mw_shl_code]
具体实现实际上是重写了底层的执行计划,将OverAggregate操作替换为了一个rank操作。