select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-03-21' as dt
) as s0
on t0.id = s0.id
when not matched and s0.id % 2 = 1 then insert *
复制代码
7.2 Select
查询Hudi表数据
select * from test_hudi_table
复制代码
查询结果如下,可以看到Hudi表中存在一条记录
7.3 Merge Into Update
使用如下SQL更新数据
merge into test_hudi_table as t0
using (
select 1 as id, 'a1' as name, 12 as price, 1001 as ts, '2021-03-21' as dt
) as s0
on t0.id = s0.id
when matched and s0.id % 2 = 1 then update set *
复制代码
7.4 Select
查询Hudi表
select * from test_hudi_table
复制代码
查询结果如下,可以看到Hudi表中的分区已经更新了
7.6 Merge Into Delete
使用如下SQL删除数据
merge into test_hudi_table t0
using (
select 1 as s_id, 'a2' as s_name, 15 as s_price, 1001 as s_ts, '2021-03-21' as dt
) s0
on t0.id = s0.s_id
when matched and s_ts = 1001 then delete
复制代码
查询结果如下,可以看到Hudi表中已经没有数据了
8. 删除表
使用如下命令删除Hudi表
drop table test_hudi_table;
复制代码
使用show tables查看表是否存在
show tables;
复制代码
可以看到已经没有表了
9. 总结
通过上面示例简单展示了通过Spark SQL Insert/Update/Delete Hudi表数据,通过SQL方式可以非常方便地操作Hudi表,降低了使用Hudi的门槛。另外Hudi集成Spark SQL工作将继续完善语法,尽量对标Snowflake和BigQuery的语法,如插入多张表(INSERT ALL WHEN condition1 INTO t1 WHEN condition2 into t2),变更Schema以及CALL Cleaner、CALL Clustering等Hudi表服务。