Python 的 asyncio.create_task() 的設計地雷

今天的 Hacker News Daily 上看到「The Heisenbug lurking in your async code」這篇,HN 的討論則是在「A Heisenbug lurking in async Python (」這。

設計上面 asyncio.create_task() 傳回的物件只有被 weak reference 到,而不是一般的 reference,所以會導致 Python 在 GC 時就真的被收走了:

Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection:

在前一段有提到可以用 asyncio.TaskGroup.create_task() 來做,這也是官方建議的解法,不過這個是 3.11 才新增的功能:

Note: asyncio.TaskGroup.create_task() is a newer alternative that allows for convenient waiting for a group of related tasks.

是個容易忘記然後中雷的東西,畢竟有個功能性接近的 threading,是可以抱持著 fire-and-forget 的心態在用,但這邊不是 threading XD

從 Mozilla 官網下載的 Firefox 帶有追蹤用的標籤

前天看到「Each Firefox download has a unique identifier」這篇報導,就順手貼到 Hacker News 上面了:「Each Firefox download has a unique identifier (」。

簡單的說就是 Mozilla 在 Firefox 的 binary 裡面加上 download token,後續就可以追蹤使用者:「[meta] Support download token」。

依照報導所提到的,每次下載 binary 都會有不同的 token:

在「Attached file — Details」裡面有回答更多細節,像是跟 Google Analytics 綁定:

5) List all proposed measurements and indicate the category of data collection for each measurement, using the [Firefox data collection categories]( found on the Mozilla wiki.   

    <td>Measurement Description</td>
    <td>Data Collection Category</td>
    <td>Tracking Bug #</td>
    <td>A download token that uniquely corresponds to a Google Analytics ID</td>
    <td>Category 4 "Highly sensitive or clearly identifiable personal data"</td>
    <td>Bug 1677497</td>

我自己重製不出來 (都是被導去 CloudFront),但留言區裡面的 Yuliya 透過 Tor 有重製出來:

I have tried some TOR exit nodes:

Name: Firefox Setup 98.0.1_germany.exe
Size: 55528896 bytes (52 MiB)
SHA256: 2d8164d547d8a0b02f2677c05e21a027dc625c0c1375fd34667b7d039746d400
SHA1: 71302acbee6895b84cf0dfae99050926f2db59ef

Name: Firefox Setup 98.0.1_austria.exe
Size: 55528896 bytes (52 MiB)
SHA256: a139a45dd5737ab981068ca2596b7fdfde15e5d4bc8541e0a2f07a65defd3e4e
SHA1: 28630a0aababa162ca9e7cbca51e50b76b9c3cff

I have labeled the file for the corresponding country of the exit node.

如果不願意換到 Chromium-based 的方案,目前在討論裡看到的替代方案是 LibreWolf,昨天裝起來後發現還行,應該也可以測試看看...

Facebook 推出了 Hermes,為了 React Native 而生的 JS Engine

Facebook 提供了一個對 React Native 最佳化的 JS engine:「Hermes: An open source JavaScript engine optimized for mobile apps, starting with React Native」。

裡面有提到兩個比較重要的的部份是 No JIT 與 Garbage collector strategy,針對行動裝置的特性而設計:避免 JIT 產生的 overhead,以及降低記憶體使用量。


不過沒有提到 CPU usage 會上升多少,只是帶過去:

Notably, our primary metrics are relatively insensitive to the engine’s CPU usage when executing JavaScript code.

對於 Facebook 也許是可以接受的數量,但對於其他人就沒概念了... 要入坑的人自己衡量這部份的風險 XD

即將出版的 Xdebug 2.6 能觀察 PHP 的 GC 情況了

在「» Feature: Garbage Collection Statistics」這邊看到 Xdebug 2.6 將能夠收集 PHP 的 GC (garbage collection) 行為了:

Xdebug's built-in garbage collection statistics profiler allows you to find out when the PHP internal garbage collector triggers, how many variables it was able to clean up, how long it took, and how how much memory was actually freed.

這樣 profiling 看的東西就更準確了...

Go 1.9 的 GC 改善

Update:被提醒後仔細看了一下,是 1.8 預設生效 (但保留選項切回來 debug),如果沒問題的話 1.9 把舊的方式拔乾淨:

Assuming things go smoothly, we will remove stack re-scanning support when the tree opens for Go 1.9 development.

標題就不改了... 以下原文。

在「Sub-millisecond GC pauses」這邊看到的。Golang 想辦法將 GC 造成的影響降低:「Proposal: Eliminate STW stack re-scanning」。

目標是解決最大的 GC pause 來源:

As of Go 1.7, the one remaining source of unbounded and potentially non-trivial stop-the-world (STW) time is stack re-scanning.

然後拿新的解法來戰,目前初步的測試看起來可以降到 50µs (== 0.05ms):

We propose to eliminate the need for stack re-scanning by switching to a hybrid write barrier that combines a Yuasa-style deletion write barrier [Yuasa '90] and a Dijkstra-style insertion write barrier [Dijkstra '78]. Preliminary experiments show that this can reduce worst-case STW time to under 50µs, and this approach may make it practical to eliminate STW mark termination altogether.

在「runtime: eliminate stack rescanning · Issue #17503 · golang/go」這邊可以看到進度,現在已經在 master branch 上了,看起來會在 1.9 的時候被放出來... 不過 worst case 的時間上修了 XDDD

The high level summary is that this reduces worst-case STW time to about 100 µs and typical 95%ile STW time to 50 µs (assuming, of course, that the OS doesn't get in the way and that the system isn't otherwise overloaded).

但看起來應該還是很大的效能改善,尤其是 CPU bound 的應用?


紐約公共圖書館這次放出了十八萬張數位資料,包括歷史照片、地圖以及信件:「The New York Public Library Lets You Download 180,000 Images in High Resolution: Historic Photographs, Maps, Letters & More」,圖書館官方的公告在「Free for All: NYPL Enhances Public Domain Collections For Sharing and Reuse」這邊:

The release of more than 180,000 digitized items represents both a simplification and an enhancement of digital access to a trove of unique and rare materials: a removal of administration fees and processes from public domain content, and also improvements to interfaces — popular and technical — to the digital assets themselves.

除了可以在「NYPL Digital Collections」這邊搜尋下載外,還有 API 可以用:「The New York Public Library Digital Collections API」,在 GitHub 上也有工具可以使用:「Digital Collections Public Domain Item Data and Tools」。

而且這 18 萬張資料是完全的開放,不需要事先取得館方授權:

No permission required, no hoops to jump through: just go forth and reuse!

將 public domain 的文物數位化,傳遞與保存變的更便利... (也讓做研究的人更容易取得資料)

Go 1.5 的進展

Andrew Gerrand 在「The State of Go - Where we are in May 2015」這份投影片裡面提到了不少 1.5 的改變與改善,預定在今年八月釋出。

首先是全部都改用 Go 寫,不再需要 C 語言的協助了:

The gc tool chain has been converted from C to Go.

而效能上的改善最大的是 GC 的部份:


Go 1.5 provides support for Android and experimental support for iOS.

這樣變得頗有趣的,自家的 Android 有打算換掉 Java 嗎?

Ruby 2.2.0:放假前的新版本

Ruby 2.2.0 在假期前釋出:「Ruby 2.2.0 Released」。

在「Notable Changes since 2.1」的部份有提到新的 GC algorithm (Incremental GC) 以及讓 Symbol 可以被 GC (Symbol GC),而且提到了 Rails 5.0 會受益於此:

Recent developments mentioned on the Rails blog suggest that Rails 5.0 will take advantage of Incremental GC as well as Symbol GC.

在 Rails 這邊的說明:

Rails 5.0 will target Ruby 2.2+ exclusively. There are a bunch of optimizations coming in Ruby 2.2 that are going to be very nice, but most importantly for Rails, symbols are going to be garbage collected.

各種 Garbage Collection 的視覺化圖片

在「Visualizing Garbage Collection Algorithms」這篇文章裡介紹了大量的 Garbage Collection 視覺化後的樣子 (GIF 動畫),拿來複習 GC algorithm...

像是這是沒有 GC 的情況 (Cleanup At The End: aka No GC):


Datomic 以及 RethinkDB...

Baron Schwartz (Percona 的 Chief Performance Architect) 寫了一篇「Immutability, MVCC, and garbage collection」狂酸 DatomicRethinkDB (喔,還稍微提到 CouchDB)。

裡面提到了 append-only B-tree 這的資料結構,優點以及會遇到的問題。(而這些問題都是致命的...)

下面的 comment 就看到 Datomic 的人跑出來反擊了,不過我懶的看了... XD

RDBMS 這麼多人發展這麼久了,不太有機會有萬靈丹突然出現解決一切問題... (這表示之前的人都是笨蛋?)

新出來一個 RDBMS 系統,官網做的很漂亮,是由一個商業公司拿錢發展出來,號稱可以解決很多問題,大概都可以先跳過去... XD