看到「SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL」這個研究投稿,PDF 檔案在 1004848.pdf 這邊。
Google 提出了 GoogleSQL 改善本來 SQL 的可讀性問題,另外也對 SQL optimizer 更有幫助。
直接拿 PDF 裡面的例子來說明,把本來是這樣的 SQL:
SELECT c_count, COUNT(*) AS custdist FROM ( SELECT c_custkey, COUNT(o_orderkey) c_count FROM customer LEFT OUTER JOIN orders ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%' GROUP BY c_custkey ) AS c_orders GROUP BY c_count ORDER BY custdist DESC, c_count DESC;
變成這樣的:
FROM customer |> LEFT OUTER JOIN orders ON c_custkey = o_custkey AND o_comment NOT LIKE '%unusual%packages%' |> AGGREGATE COUNT(o_orderkey) c_count GROUP BY c_custkey |> AGGREGATE COUNT(*) AS custdist GROUP BY c_count |> ORDER BY custdist DESC, c_count DESC;
可以看到這個語法除了變好讀以外,也指示了 SQL optimizer 怎麼去過濾與組合資料。
依照 Google 的說明,GoogleSQL 已經在許多 Google 的系統上實作了:
We’ve implemented pipe syntax in GoogleSQL, the SQL dialect and implementation shared across all1 SQL systems at Google including F1, BigQuery, Spanner and Procella, and the open-source release, ZetaSQL. GoogleSQL is a shared, reusable component, enabling many systems to share the same SQL dialect. This shared component allowed implementing pipe syntax in one place and then enabling it across many products.
裡面提到的 ZetaSQL 可以在 pipe-syntax.md 這邊看到。
我記得其他家也有類似的東西,PRQL 也是在解決類似的問題,不過語法走向不太一樣。
可以看看怎麼收斂...