Ingo Molnár 提出讓 Linux Kernel 編譯速度提昇的 mega patch

Hacker News 首頁上看到「Massive ~2.3k Patch Series Would Improve Linux Build Times 50~80% & Fix "Dependency Hell"」這個,對應到 mailing list 上的信件是「* [PATCH 0000/2297] [ANNOUNCE, RFC] "Fast Kernel Headers" Tree -v1: Eliminate the Linux kernel's "Dependency Hell"」這個,看到「0000/2297」這個 prefix XDDD

他主要是想要改善 Linux Kernel 的 compile 時間 (從 project 的名稱「Fast Kernel Headers」可以看到),只是沒想到會縮短這麼多。另外一方面也順便處理了 dependency hell 的問題 (改善維護性)。

測試出來的結果相當驚人,從 231.34 +- 0.60 secs (15.5 builds/hour) 到 129.97 +- 0.51 secs (27.7 builds/hour),以編譯次數來看的話是 78% 的改善。如果以 CPU time 來看的話,從 11,474,982.05 msec cpu-clock 降到 7,100,730.37 msec cpu-clock,也是以編譯次數來算的話,有 61.6% 的改善...

這是花了一年多的時間嘗試才達成的目標,嘗試不同的方法,前幾次雖然都有改善,但改善幅度太小,變動卻太大,他覺得不值得丟出來,直到第三次才達成這樣的目標...

第一次:

When I started this project, late 2020, I expected there to be maybe 50-100 patches. I did a few crude measurements that suggested that about 20% build speed improvement could be gained by reducing header dependencies, without having a substantial runtime effect on the kernel. Seemed substantial enough to justify 50-100 commits.

第二次:

But as the number of patches increased, I saw only limited performance increases. By mid-2021 I got to over 500 commits in this tree and had to throw away my second attempt (!), the first two approaches simply didn't scale, weren't maintainable and barely offered a 4% build speedup, not worth the churn of 500 patches and not worth even announcing.

第三次:

With the third attempt I introduced the per_task() machinery which brought the necessary flexibility to reduce dependencies drastically, and it was a type-clean approach that improved maintainability. But even at 1,000 commits I barely got to a 10% build speed improvement. Again this was not something I felt comfortable pushing upstream, or even announcing. :-/

然後基於第三次的成果覺得有望,意外的發現後續的速度改善比想像中的多非常多:

But the numbers were pretty clear: 20% performance gains were very much possible. So I kept developing this tree, and most of the speedups started arriving after over 1,500 commits, in the fall of 2021. I was very surprised when it went beyond 20% speedup and more, then arrived at the current 78% with my reference config. There's a clear super-linear improvement property of kernel build overhead, once the number of dependencies is reduced to the bare minimum.

這次的 patch 雖然超大包,但看起來對於 compile 時間改善非常多,應該會有不少討論... 消息還蠻新的 (台灣時間今天早上五點的信),晚點可以看一下其他大老出來回什麼...

Linux Kernel 裡的 RNG 從 SHA-1 換成 BLAKE2s

Hacker News Daily 上看到的消息,Linux Kernel 裡的 RNG,裡面用到的 SHA-1 演算法換成 BLAKE2s 了:

SHA-1 已知的問題是個隱患,不過換成 BLAKE2s 應該是 maintainer 的偏好,Jason Donenfeld 在 WireGuard 裡面也是用 BLAKE2s...

SCTP over UDP

Red Hat 的 blog 上看到 SCTP over UDP 的方式,讓不支援 SCTP 的 NAT 裝置也能使用 SCTP:「An easier way to go: SCTP over UDP in the Linux kernel」。

這個方式也是個標準,在 RFC 6951:「UDP Encapsulation of Stream Control Transmission Protocol (SCTP) Packets for End-Host to End-Host Communication」。

Linux kernel 則是在 5.11 之後才支援,不過翻了一下 5.11 是一般版,在 2021 年二月釋出,在五月底就已經 EoL:

Stream Control Transmission Protocol over User Datagram Protocol (SCTP over UDP, also known as UDP encapsulation of SCTP) is a feature defined in RFC6951 and implemented in the Linux kernel space since 5.11.0.

上一個 LTS 版本是 5.10,下一個不知道是什麼時候了...

Linux Kernel 與明尼蘇達大學之間的攻防

Linux Kernel Community 與明尼蘇達大學 (UMN) 之間的事件差不多告一段落了,整理一下裡面比較重要的事件。隔壁棚 Basecamp 的事情還在燒,讓子彈多飛一點時間,等該跑出來的內部資訊都跑出來以後再來整理...

Linux Kernel 這件事情各家媒體都有整理出來,這邊拉 ZDNet 的文章來看:

講一下我的感想,因為 UMN 可以從這次事件證明了 Linux Kernel Community 沒有足夠的能力抵禦這類惡意攻擊,而且 Linux Kernel Community 也沒有打算解決這件事情,如果要比喻的話,很像台灣常看到的「解決發現問題的人」。

只要流程沒有改善,幾乎可以預測出之後會有政府資助的方式塞 buggy patch 進去埋洞。

作為 Linux 作業系統使用者,看起來沒什麼可以改變的,只能從架構面上設計出來安全界線,讓被攻進來時有一些防線防止直接打穿到底...

kernel.org 推動 git push 的簽名

LWN 上看到 kernel.org 解釋並且推動 git push 的電子簽名:「Signed pushes for kernel.org」。

在「Signed git pushes」這篇裡面解釋了除了 git commit 本身的簽名外,另外 git push 的簽名也有重要性。

而在「Introducing the kernel.org git transparency log」這邊則是公開了對應的 git transparency log。

這個方法讓 kernel.org 上的 git repository 會更透明,加上 kernel.org 上的 git repository 會被很多人 mirror,在真的出問題的時候,大家手上也都有資料可以交叉比對...

Linus 狂幹 Intel 的 AVX-512

這幾天蠻熱鬧的消息,Linus 幹翻 Intel 丟出來的 AVX-512:「Alder Lake and AVX-512」。

在維基百科的「Advanced Vector Extensions」這邊有提到,因為 AVX-512 執行時會消耗產生更多的熱量,所以得壓低 Turbo Boost 執行:

Since AVX instructions are wider and generate more heat, Intel processors have provisions to reduce the Turbo Boost frequency limit when such instructions are being executed. The throttling is divided into three levels:

  • L0 (100%): The normal turbo boost limit.
  • L1 (~85%): The "AVX boost" limit. Soft-triggered by 256-bit "heavy" (floating-point unit: FP math and integer multiplication) instructions. Hard-triggered by "light" (all other) 512-bit instructions.
  • L2 (~60%): The "AVX-512 boost" limit. Soft-triggered by 512-bit heavy instructions.

本來 AVX 與 AVX-2 只會在某些重量級的指令時會壓 15%,現在在 AVX-512 則是變成常態,而且有些會降到 40%,對於同時在跑的應用會受到很大的影響,所以 Linus 也直接放話要用他的權限擋這件事情 (我把動詞讀錯了):

I want my power limits to be reached with regular integer code, not with some AVX512 power virus that takes away top frequency (because people ended up using it for memcpy!) and takes away cores (because those useless garbage units take up space).

在後面的討論串「Alder Lake and AVX-512」這邊 Linus 有提到更細,像是他對於 MMX/SSE/AVX/AVX2 的想法,以及為什麼他這麼厭惡 AVX-512。

AMD 的繼續看戲 XDDD

Linux Kernel 5.7 釋出...

在「The New Features Of The Linux 5.7 Kernel: Tiger Lake Graphics Stable, New exFAT, Zstd F2FS, Performance」這邊有列出重點來。

其中把過熱保護機制也一起考慮進來,這樣可以避免過熱被強制降速而反而變非常慢:

Thermal pressure tracking for systems that are thermally overloaded for better task placement on CPU cores running hot.

另外一個是把 exFAT 驅動換成由 Samsung 維護的版本,照其他文章的說明,這個版本比較穩定...

The new exFAT file-system driver that replaces the exFAT driver in the staging area that had been around for a few releases. This new exFAT driver is in much better shape and actively maintained by Samsung.

主要還是過熱保護那段還蠻值得期待,不然就是要硬上水冷壓,避免遇到溫度牆...

WireGuard 的 OpenBSD porting

在「WireGuard patchset for OpenBSD」這邊看到有人試著把 WireGuard 放入 OpenBSD 的消息。

整包 patchset 包括了 kernel 與 userland 的實做,可以在 mailing list 上「WireGuard patchset for OpenBSD」這邊可以看到,整串討論可以在「'WireGuard patchset for OpenBSD' thread - MARC」這邊看到,目前看起來還在 code review 的階段,有看到討論提到應該用 OpenBSD 內已經實做的 Chacha20-Poly1305,所以可能還會需要一些時間...

看起來慢慢的在滲進每個作業系統中,蠻有希望在幾年後成為業界標準...

WireGuard 1.0.0 的釋出

在「[ANNOUNCE] WireGuard 1.0.0 for Linux 5.6 Released」這邊看到的消息,看起來 WireGuard 1.0.0 會回過頭來 backport 到幾個重要的版本:

We'll also continue to maintain our wireguard-linux-compat [2] backports repo for older kernels. On the backports front, WireGuard was backported to Ubuntu 20.04 (via wireguard-linux-compat) [4] and Debian Buster (via a real backport to 5.5.y) [5]. I'm also maintaining real backports, not via the compat layer, to 5.4.y [6] and 5.5.y [7], and we'll see where those wind up; 5.4.y is an LTS release.

包括 DebianUbuntu 的新版,以及 5.4.x 的 LTS 版本,讓使用起來更方便一些...

TCP Congestion Control Algorithm 的選擇

先前 Ubuntu 桌機用 BBR 跑了一陣子,但有遇到一些問題 (可以參考「Dropbox 測試 BBRv2 的結果」這篇),所以暫時換成 Westwood,但還是陸陸續續會看一下各種研究。

剛剛在「[tor-relays] TCP CCA for Tor Relays (and especially Bridges)」這邊看到一個經驗談:

Here are my completely unscientific scribbles of how all the various algorithms behaved. The scenario is uploading for a minute or so, observing the speed in MB/sec visually, then recording how it appeared to change during that minute (and then repeating this a couple of times to be certain).

tcp_bic.ko       -- 6...5...4
tcp_highspeed.ko -- 2
tcp_htcp.ko      -- 1.5...3...2
tcp_hybla.ko     -- 3...2...1
tcp_illinois.ko  -- 6...7...10
tcp_lp.ko        -- 2...1
tcp_scalable.ko  -- 5...4...3
tcp_vegas.ko     -- 2.5
tcp_veno.ko      -- 2.5
tcp_westwood.ko  -- <1
tcp_yeah.ko      -- 2...5...6

上面是「目視法」觀察到的速度 (MB/sec),看了一下維基百科上 TCP-Illinois 的說明,看起來設計的目的是提供給頻寬大、latency 高的情境下:

It is especially targeted at high-speed, long-distance networks.

來跑跑看好了...