Java 21 的 ZGC 在 Netflix 的效果

Hacker News 上看到連結「Bending pause times to your will with Generational ZGC (netflixtechblog.com)」,發現這篇還沒整理:「Bending pause times to your will with Generational ZGC」,裡面講的東西都有圖有數字 (i.e. Y 軸),作者是 Danny Thomas

在這之前他們就已經知道 GC pause 是延遲的重要來源,會導致 timeout & retry:

In both our GRPC and DGS Framework services, GC pauses are a significant source of tail latencies.

That’s particularly true of our GRPC clients and servers, where request cancellations due to timeouts interact with reliability features such as retries, hedging and fallbacks.

第一張圖拉出來的資料是 error rate,白色是上個禮拜的資料,紫色是這個禮拜的資料,而從 G1GC 切到 ZGC 是在 2023/11/16 發生的:

可以看到很明顯的 error rate 改變:尖峰從 2k 下降到大約 0.3k,大約是原來的 1/6 到 1/7 的下降。

第二張圖是 GC 的時間:

可以看到 G1GC 還是偶而會撞到 2 秒,發生時平均值也都還是會 >100ms,切到 ZGC 後直接降到個位數 ms 等級了。

第三張圖是 memory overhead 的部分:

從圖上可以看到上週與本週的對比,導入 ZGC 後記憶體的使用量下降了,不過文裡面倒是沒解釋這點,反而提到 ZGC 比起 G1GC 有個固定的 3% overhead:

ZGC has a fixed overhead 3% of the heap size, requiring more native memory than G1. Except in a couple of cases, there’s been no need to lower the maximum heap size to allow for more headroom, and those were services with greater than average native memory needs.

第四張則是 Huge Pages 的差異,這邊要注意這張圖的 Y 軸不是從 0 計算:

可以看到在開 Huge Pages 後,在 RPS (request per second) 不變的情況下 CPU 使用率是有下降的,大約從 50% 降到 45% 左右,不過這張圖的時間跨度有點少,應該是要拉長一點的圖... 不過既然被提出來了,就假設 Netflix 內看起來應該是有這個趨勢,只是抓圖的時候懶了點?

整體算是大成功?

Python 的 asyncio.create_task() 的設計地雷

今天的 Hacker News Daily 上看到「The Heisenbug lurking in your async code」這篇,HN 的討論則是在「A Heisenbug lurking in async Python (textualize.io)」這。

設計上面 asyncio.create_task() 傳回的物件只有被 weak reference 到,而不是一般的 reference,所以會導致 Python 在 GC 時就真的被收走了:

Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection:

在前一段有提到可以用 asyncio.TaskGroup.create_task() 來做,這也是官方建議的解法,不過這個是 3.11 才新增的功能:

Note: asyncio.TaskGroup.create_task() is a newer alternative that allows for convenient waiting for a group of related tasks.

是個容易忘記然後中雷的東西,畢竟有個功能性接近的 threading,是可以抱持著 fire-and-forget 的心態在用,但這邊不是 threading XD

從 Mozilla 官網下載的 Firefox 帶有追蹤用的標籤

前天看到「Each Firefox download has a unique identifier」這篇報導,就順手貼到 Hacker News 上面了:「Each Firefox download has a unique identifier (ghacks.net)」。

簡單的說就是 Mozilla 在 Firefox 的 binary 裡面加上 download token,後續就可以追蹤使用者:「[meta] Support download token」。

依照報導所提到的,每次下載 binary 都會有不同的 token:

在「Attached file dltoken_data_review.md — Details」裡面有回答更多細節,像是跟 Google Analytics 綁定:

5) List all proposed measurements and indicate the category of data collection for each measurement, using the [Firefox data collection categories](https://wiki.mozilla.org/Firefox/Data_Collection) found on the Mozilla wiki.   

<table>
  <tr>
    <td>Measurement Description</td>
    <td>Data Collection Category</td>
    <td>Tracking Bug #</td>
  </tr>
  <tr>
    <td>A download token that uniquely corresponds to a Google Analytics ID</td>
    <td>Category 4 "Highly sensitive or clearly identifiable personal data"</td>
    <td>Bug 1677497</td>
  </tr>
</table>

我自己重製不出來 (都是被導去 CloudFront),但留言區裡面的 Yuliya 透過 Tor 有重製出來:

I have tried some TOR exit nodes:

Name: Firefox Setup 98.0.1_germany.exe
Size: 55528896 bytes (52 MiB)
SHA256: 2d8164d547d8a0b02f2677c05e21a027dc625c0c1375fd34667b7d039746d400
SHA1: 71302acbee6895b84cf0dfae99050926f2db59ef

Name: Firefox Setup 98.0.1_austria.exe
Size: 55528896 bytes (52 MiB)
SHA256: a139a45dd5737ab981068ca2596b7fdfde15e5d4bc8541e0a2f07a65defd3e4e
SHA1: 28630a0aababa162ca9e7cbca51e50b76b9c3cff

I have labeled the file for the corresponding country of the exit node.

如果不願意換到 Chromium-based 的方案,目前在討論裡看到的替代方案是 LibreWolf,昨天裝起來後發現還行,應該也可以測試看看...

Facebook 推出了 Hermes,為了 React Native 而生的 JS Engine

Facebook 提供了一個對 React Native 最佳化的 JS engine:「Hermes: An open source JavaScript engine optimized for mobile apps, starting with React Native」。

裡面有提到兩個比較重要的的部份是 No JIT 與 Garbage collector strategy,針對行動裝置的特性而設計:避免 JIT 產生的 overhead,以及降低記憶體使用量。

官方給的改善主要也都是偏這兩塊:

不過沒有提到 CPU usage 會上升多少,只是帶過去:

Notably, our primary metrics are relatively insensitive to the engine’s CPU usage when executing JavaScript code.

對於 Facebook 也許是可以接受的數量,但對於其他人就沒概念了... 要入坑的人自己衡量這部份的風險 XD

即將出版的 Xdebug 2.6 能觀察 PHP 的 GC 情況了

在「» Feature: Garbage Collection Statistics」這邊看到 Xdebug 2.6 將能夠收集 PHP 的 GC (garbage collection) 行為了:

Xdebug's built-in garbage collection statistics profiler allows you to find out when the PHP internal garbage collector triggers, how many variables it was able to clean up, how long it took, and how how much memory was actually freed.

這樣 profiling 看的東西就更準確了...

Go 1.9 的 GC 改善

Update:被提醒後仔細看了一下,是 1.8 預設生效 (但保留選項切回來 debug),如果沒問題的話 1.9 把舊的方式拔乾淨:

Assuming things go smoothly, we will remove stack re-scanning support when the tree opens for Go 1.9 development.

標題就不改了... 以下原文。

在「Sub-millisecond GC pauses」這邊看到的。Golang 想辦法將 GC 造成的影響降低:「Proposal: Eliminate STW stack re-scanning」。

目標是解決最大的 GC pause 來源:

As of Go 1.7, the one remaining source of unbounded and potentially non-trivial stop-the-world (STW) time is stack re-scanning.

然後拿新的解法來戰,目前初步的測試看起來可以降到 50µs (== 0.05ms):

We propose to eliminate the need for stack re-scanning by switching to a hybrid write barrier that combines a Yuasa-style deletion write barrier [Yuasa '90] and a Dijkstra-style insertion write barrier [Dijkstra '78]. Preliminary experiments show that this can reduce worst-case STW time to under 50µs, and this approach may make it practical to eliminate STW mark termination altogether.

在「runtime: eliminate stack rescanning · Issue #17503 · golang/go」這邊可以看到進度,現在已經在 master branch 上了,看起來會在 1.9 的時候被放出來... 不過 worst case 的時間上修了 XDDD

The high level summary is that this reduces worst-case STW time to about 100 µs and typical 95%ile STW time to 50 µs (assuming, of course, that the OS doesn't get in the way and that the system isn't otherwise overloaded).

但看起來應該還是很大的效能改善,尤其是 CPU bound 的應用?

紐約公共圖書館放出十八萬張數位高畫質的數位資料

紐約公共圖書館這次放出了十八萬張數位資料,包括歷史照片、地圖以及信件:「The New York Public Library Lets You Download 180,000 Images in High Resolution: Historic Photographs, Maps, Letters & More」,圖書館官方的公告在「Free for All: NYPL Enhances Public Domain Collections For Sharing and Reuse」這邊:

The release of more than 180,000 digitized items represents both a simplification and an enhancement of digital access to a trove of unique and rare materials: a removal of administration fees and processes from public domain content, and also improvements to interfaces — popular and technical — to the digital assets themselves.

除了可以在「NYPL Digital Collections」這邊搜尋下載外,還有 API 可以用:「The New York Public Library Digital Collections API」,在 GitHub 上也有工具可以使用:「Digital Collections Public Domain Item Data and Tools」。

而且這 18 萬張資料是完全的開放,不需要事先取得館方授權:

No permission required, no hoops to jump through: just go forth and reuse!

將 public domain 的文物數位化,傳遞與保存變的更便利... (也讓做研究的人更容易取得資料)

Go 1.5 的進展

Andrew Gerrand 在「The State of Go - Where we are in May 2015」這份投影片裡面提到了不少 1.5 的改變與改善,預定在今年八月釋出。

首先是全部都改用 Go 寫,不再需要 C 語言的協助了:

The gc tool chain has been converted from C to Go.

而效能上的改善最大的是 GC 的部份:

另外是對行動平台的發展:

Go 1.5 provides support for Android and experimental support for iOS.

這樣變得頗有趣的,自家的 Android 有打算換掉 Java 嗎?

Ruby 2.2.0:放假前的新版本

Ruby 2.2.0 在假期前釋出:「Ruby 2.2.0 Released」。

在「Notable Changes since 2.1」的部份有提到新的 GC algorithm (Incremental GC) 以及讓 Symbol 可以被 GC (Symbol GC),而且提到了 Rails 5.0 會受益於此:

Recent developments mentioned on the Rails blog suggest that Rails 5.0 will take advantage of Incremental GC as well as Symbol GC.

在 Rails 這邊的說明:

Rails 5.0 will target Ruby 2.2+ exclusively. There are a bunch of optimizations coming in Ruby 2.2 that are going to be very nice, but most importantly for Rails, symbols are going to be garbage collected.

各種 Garbage Collection 的視覺化圖片

在「Visualizing Garbage Collection Algorithms」這篇文章裡介紹了大量的 Garbage Collection 視覺化後的樣子 (GIF 動畫),拿來複習 GC algorithm...

像是這是沒有 GC 的情況 (Cleanup At The End: aka No GC):

很有趣的動畫效果...