byte – Gea-Suan Lin's BLOG

Pony ORM

在 Simon Willison 的 blog 上看到的東西：「Python’s “Disappointing” Superpowers」，裡面提到的原文是「Python’s “Disappointing” Superpowers」這篇，在講 Python 的工具。

雖然是說「disappointing」，但實際上是反義，在原文裡面提到了很多特別的工具，其中 Pony ORM 算是我覺得最有趣的了，他的寫法就非常的 Python：

select(c for c in Customer if sum(c.orders.price) > 1000)

也可以用 lambda 的形式來寫：

Customer.select(lambda c: sum(c.orders.total_price) > 1000)

這樣會產生出對應的 SQL：

SELECT "c"."id"
FROM "customer" "c"
  LEFT JOIN "order" "order-1"
    ON "c"."id" = "order-1"."customer"
GROUP BY "c"."id"
HAVING coalesce(SUM("order-1"."total_price"), 0) > 1000

不會產生 syntax error 的原因是因為他直接解讀 bytecode 分析，產生出對應的 SQL query：

A normal understanding of generator expressions suggests that the select function is consuming a generator. But that couldn’t explain the behaviour here. Instead, it actually introspects the frame object of the calling code, then decompiles the byte code of the generator expression object it finds, and builds a Query based on the AST objects.

用這樣的設計來達到語法的自由度。

看了一下也有一些 integration，像是 Flask 的「Integration with flask」與 FastAPI 的「Integration with FastAPI」。

不過應該是先看看，目前 Python 上用的主力還是 Django，有自己的 ORM 架構...

CloudFront 支援 TLS 1.3

看到 AWS 的公告，宣佈 CloudFront 支援 TLS 1.3：「Amazon CloudFront announces support for TLSv1.3 for viewer connections」。

預設會自動啟用：

TLSv1.3 is available today and enabled by default across all Amazon CloudFront security policies options. No additional changes are required to your CloudFront configuration to benefit from the security and performance improvements of TLSv1.3 for your viewer connections.

對使用者最大的差異應該還是改善 first byte 的時間 (主要是因為 handshake 時間縮短)，這點 AWS 的人也有提到在內部測試時，美國區的改善情況：

In our own internal tests in the US region as an example, first byte latency for new negotiated connections saw reductions of up to 33% for TLSv1.3 compared to previous versions of TLS.

在 latency 更高的地區應該也會有大幅改善...

JSON Canonicalization

這篇是講 JSON object 上的簽名，但實際上就是在討論 JSON Canonicalization 的前因後果：「How (not) to sign a JSON object」。

在處理 JSON 資料時，「判斷兩個 JSON object 是否相同」是一個不怎麼簡單的問題，其中一個想法是找一個機制可以把意義相同的 JSON object 都轉成相同的 (byte)string representative，這也就是 JSON Canonicalization。當你可以確保意義相同的 JSON Canonicalization 後，你就可以對 string 本身簽名。

這件事情其實在 XML 就有過同樣的歷史故事 (yeah，總是有人愛在某種資料格式上面疊上簽名)，也就是「XML Signature」這個方式。

在 XML 這邊不幸的是，還不少標準選用 XML Signature，像是當年為了實做 Google Apps (現在叫做 G Suite) 的 SSO，而需要接 SAML...

回到原來的 JSON Canonicalization，可以馬上想到的變化包括了空白與 object 裡 key 的順序，也就是這兩個：

{"a":1,"b":2}

{
  "b": 2,
  "a": 1
}

但不幸的是，還有 Unicode 來一起亂，也就是下面這個跟上面有相同的意思：

{
  "\u0062": 2,
  "\u0061": 1
}

另外還有其他的地雷是平常不會想到的，如果你因為複雜而決定用 library 來做，那也代表 library 必須面對這些複雜的情境，未必沒有 bug...

所以文章作者在最後面才會請大家不要再來亂了 XDDD

Maybe you don’t need request signing? A bearer token header is fine, or HMAC(k, timestamp) if you’re feeling fancy, or mTLS if you really care.

Canonicalization is fiendishly difficult.

Add a signature on the outside of the request body, make sure the request body is complete, and don’t worry about “signing what is said versus what is meant” – it’s OK to sign the exact byte sequence.

最佳化 nginx 的 TLS Time to First Byte (TTTFB)

在「Optimizing NGINX TLS Time To First Byte (TTTFB)」這篇文章裡在討論要如何讓 nginx 的 TLS Time to First Byte (TTTFB) 盡可能短。

可以看到文章裡面用到兩個方法，一個是修改 nginx 的程式碼縮小 TLS record size。我對是覺得頗危險，尤其是作者的改法不知道有什麼 side-effect... (要注意 nginx 裡面直接拿 NGX_SSL_BUFSIZE 給 BIO_set_write_buffer_size 使用，這代表有可能還有其他的地方也是這樣搞？)

第二個方法是開啟 TLS False Start，目前主流的瀏覽器都陸陸續續支援了。

這是文章作者的測試：

可以看到時間減少的相當多。

現在是期望作者這篇文章的測試可以讓 patch 合併回 mainstream 後再用，這樣有比較多人 audit...