FreeBSD 14.0 釋出

FreeBSD 14.0-RELEASE 的公告也出來了:「FreeBSD 14.0-RELEASE Announcement」,比較完整的 release notes 在「FreeBSD 14.0-RELEASE Release Notes」。

先從官方列的 highlight 來看,首先比較重要的是 GENERIC kernel 支援 1024 cores:

FreeBSD supports up to 1024 cores on the amd64 and arm64 platforms.

看了一下 commit log 是從 256 變成 1024

先就 x86-64 這邊來看,目前「家用」最多的應該是 AMD7995WX (96 cores),舊版的 256 限制應該也還能撐住,但看 commit log 有提到,主要是預期這幾年應該會有更暴力的機器出現。

另外一塊是伺服器端,Intel 這邊有 8 sockets 的版本 (參考「Intel Xeon Sapphire Rapids to Scale to 4 and 8 Sockets」),如果都是接 8490H 的話就是 480 cores 了。

ARM 的話好像也可以堆,但不熟...

另外一個提到的重點是 TCP 預設的 congestion control 改成 CUBIC

The default congestion control mechanism for TCP is now CUBIC.

翻 commit log 可以看到是從 NewReno 換成 CUBIC 的,這樣就跟 Linux kernel 預設值一樣了。

再來比較重要的是在 release notes 裡面提到的,FreeBSD 15.0 將會拔光 32-bit 環境的支援,只留 armv7,這代表 Raspberry Pi 第一代的 armv6 也被淘汰掉了:

FreeBSD 15.0 is not expected to include support for 32-bit platforms other than armv7. The armv6, i386, and powerpc platforms are deprecated and will be removed. 64-bit systems will still be able to run older 32-bit binaries.

然後有些我自己翻覺得還蠻有趣的。

首先是看到 non-root 的 chroot

The chroot facility supports unprivileged operation, and the chroot(8) program has a -n option to enable its use. a40cf4175c90 (Sponsored by EPSRC)

然後把 OpenSSH 內對 FIDO/U2F 的支援開起來了:

The use of FIDO/U2F hardware authenticators has been enabled in ssh, using the new public key types ecdsa-sk and ed25519-sk, along with corresponding certificate types. FIDO/U2F support is described in https://www.openssh.com/txt/release-8.2. e9a994639b2a (Sponsored by The FreeBSD Foundation)

ASLR 預設開啟:

Address Space Layout Randomization (ASLR) is enabled for 64-bit executables by default. It can be disabled as needed if applications fail unexpectedly, for example with segmentation faults. To disable for a single invocation, use the proccontrol(1) command: proccontrol -m aslr -s disable command. To disable ASLR for all invocations of a binary, use the elfctl(1) command: elfctl -e +noaslr file. Problems should be reported via the problem reporting system, https://bugs.freebsd.org, or posting to the freebsd-stable@FreeBSD.org mailing list. b014e0f15bc7 (Sponsored by Stormshield)

然後先前被罵臭頭的 WireGuard 支援也放回來了:(「FreeBSD & pfSense 上的 WireGuard 問題」)

The kernel wg(4) WireGuard driver has been reintegrated; it provides Virtual Private Network (VPN) interfaces using the WireGuard protocol. 744bfb213144 (Sponsored by Rubicon Communications, LLC ("Netgate") and The FreeBSD Foundation)

然後看到 Netflix 贊助的 kTLS 支援 TLS 1.3:

KTLS (the kernel TLS implementation) has added receive offload support for TLS 1.3. Receive offload is now supported for TLS 1.1 through 1.3; send offload is supported for TLS 1.0 through 1.3. 05a1d0f5d7ac (Sponsored by Netflix)

然後 FreeBSD 長久以來 root 預設用的 /bin/csh 改成 /bin/sh 了:

The default shell for the root user is now sh(1), which has many new features for interactive use. d410b585b6f0

預設的 MTA 變成 dma (Dragonfly Mail Agent),看名字加上翻了一下 manpage,確認是從 Dragonfly BSD 移植過來的:

The default mail transport agent (MTA) is now the Dragonfly Mail Agent (dma(8)) rather than sendmail(8). Configuration of the MTA is done via mailer.conf(5). sendmail(8) and its configuration remain available. a67b925ff3e5

然後 portsnap 被拔掉了,現在就建議直接用 git 拉了,算是功成身退了:

The portsnap(8) utility has been removed. Users are encouraged to fetch the ports tree by using pkg install git and then git clone https://git.FreeBSD.org/ports.git /usr/ports. df53ae0fdd98

而 mergemaster 也被換成 etcupdate 了:

mergemaster(8) has been deprecated. Its replacement is etcupdate(8). 398b12691b4f (Sponsored by The FreeBSD Foundation)

然後支援 tarfs,而且可以用 zstd

The tarfs(5) file system has been added, which is backed by POSIX tar archives optionally compressed with zstd(1). 69d94f4c7608 (Sponsored by Juniper Networks, Inc.) (Sponsored by Klara, Inc.)

好久沒看 FreeBSD 的 release notes...

Tor 0.4.7.7 支援 congestion control

Tor 首度在協定內支援了 congestion control:「Congestion Control Arrives in Tor 0.4.7-stable!」。

這個新功能會帶來效能的提昇:

Tor has released 0.4.7.7, the first stable Tor release with support for congestion control. Congestion control will eliminate the speed limit of current Tor, as well as reduce latency by minimizing queue lengths at relays. It will result in significant performance improvements in Tor, as well as increased utilization of our network capacity.

之所以沒有辦法直接利用 packet loss 的方式讓 TCP network stack 直接判斷 congestion control,是因為這樣會產生 side channel:

Crucially, we rejected mechanisms to provide congestion control by allowing packet drops, due to the ability to introduce end-to-end side channels in the packet drop pattern.

所以 Tor 得自己實做 congestion control 演算法,選擇的演算法是結合了 Vegas 的 Tor-Vegas,可以看到在實驗中,德國與香港的 exit node 效率大幅提昇:

另外也因為 0.4.7.7 也出來一個禮拜了,也可以看到 Advertised Bandwidth (算是 Tor network 觀察到的 bandwidth) 開始成長:

另外一個重要的點是 UDP 的支援計畫,看起來在這次改善後也比較有可行性了:

The astute reader will note that we rejected datagram transports. However, this does not mean that Tor will never carry UDP traffic. On the contrary, congestion control deployment means that queue delay and latency will be much more stable and predictable. This will enable us to carry UDP without packet drops in the network, and only drop UDP at the edges, when the congestion window becomes full. We are hopeful that this new behavior will match what existing UDP protocols expect, allowing their use over Tor.

TCP Congestion Control Algorithm 的選擇

先前 Ubuntu 桌機用 BBR 跑了一陣子,但有遇到一些問題 (可以參考「Dropbox 測試 BBRv2 的結果」這篇),所以暫時換成 Westwood,但還是陸陸續續會看一下各種研究。

剛剛在「[tor-relays] TCP CCA for Tor Relays (and especially Bridges)」這邊看到一個經驗談:

Here are my completely unscientific scribbles of how all the various algorithms behaved. The scenario is uploading for a minute or so, observing the speed in MB/sec visually, then recording how it appeared to change during that minute (and then repeating this a couple of times to be certain).

tcp_bic.ko       -- 6...5...4
tcp_highspeed.ko -- 2
tcp_htcp.ko      -- 1.5...3...2
tcp_hybla.ko     -- 3...2...1
tcp_illinois.ko  -- 6...7...10
tcp_lp.ko        -- 2...1
tcp_scalable.ko  -- 5...4...3
tcp_vegas.ko     -- 2.5
tcp_veno.ko      -- 2.5
tcp_westwood.ko  -- <1
tcp_yeah.ko      -- 2...5...6

上面是「目視法」觀察到的速度 (MB/sec),看了一下維基百科上 TCP-Illinois 的說明,看起來設計的目的是提供給頻寬大、latency 高的情境下:

It is especially targeted at high-speed, long-distance networks.

來跑跑看好了...

Dropbox 測試 BBRv2 的結果

BBRv1 有不少問題,在 BBRv2 有一些改善 (目前還在測試階段,在「TCP BBR v2 Alpha/Preview Release」這邊可以看到一些說明),而 Dropbox 則是跳下去測試,並且公佈結果:「Evaluating BBRv2 on the Dropbox Edge Network」。

Spoiler alert: BBRv2 is slower than BBRv1 but that’s a good thing.

在文章開頭的這張圖就說明了 BBRv2 的速度比較慢,但是說明這是朝好的方向改善。

BBRv1 的問題其實我自己都有遇到:我自己的 Ubuntu 桌機跑 BBRv1,在我上傳大量資料的時候 (只開一條連線),會導致 PPPoE 的 health check 失敗,於是就斷線了,另外 VM 裡面的 Windows 7 因為也是 bridge mode 跑 PPPoE,也可以看到斷線嘗試重連的訊息,於是只好改掉...

上面提到的問題就是 BBRv1 造成 packet loss 過高,除了我遇到的問題外,這對於其他 loss-based 的 TCP congestion algorithm 來說會有很大的傷害 (i.e. 不公平):

Other tradeoffs were quite conceptual: BBRv1’s unfairness towards loss-based congestion controls (e.g. CUBIC, Compound), RTT-unfairness between BBRv1 flows, and (almost) total disregard for the packet loss:

另外一個改善是 BBRv2 加入了 ECN 機制,可以更清楚知道塞住的情況。

整體上來說應該會好不少,不知道之後正式釋出後會不會直接換掉 Linux Kernel 裡的 BBRv1,或是不換,讓 BBRv1 與 BBRv2 共存?

CloudFront 的 BBR 效能提昇

這是在找一些 TCP congestion algorithm 相關的資訊時發現的,看起來 Amazon CloudFront 導入 BBR 一陣子了:「TCP BBR Congestion Control with Amazon CloudFront」。

不過 BBR 被研究的愈來愈多,大家開始發現這個演算法的霸道,跟其他的 TCP congestion algorithm 並不太能和平共存,但這就跟軍事武器一樣,隔壁升級了你就得跟著升級,抱怨沒有用,只會被消滅...

HTTP/3 (QUIC) 的反面看法

這篇整理了 HTTP/3 (QUIC) 的反面看法,算是常見的疑慮都列出來了:「QUIC and HTTP/3 : Too big to fail?!」。

其實大多都是使用 UDP 而導致的問題:

  • 因為 UDP 導致 firewall 可能沒開,以及可能會需要等 timeout 走回 TCP 的問題。
  • 因為 UDP 變成很多事情在 userland 處理,而導致的 CPU 使用率比使用 TCP 的 TLS 1.2/1.3 高很多。
  • 因為 UDP 導致 amplification attack 的安全性問題,以及對應的 workaround 產生的頻寬議題。
  • 由於 UDP 會需要自己控制擁塞,等於是在 UDP 上面又重做了一次 TCP congestion algorithm,而且因為重作所以得考慮與 TCP 搶資源的公平性。

整篇文章算是整理了一般對 HTTP/3 的疑慮,之後如果有進展的話,可以再拿出來當 checklist 再確認有哪些有改善...

又一個 TCP BBR 的測試結果

TCP BBRGoogle 發表的 TCP congestion control 演算法,是一個純伺服器端就能夠改善 TCP 壅塞處理的機制。在 Linux Kernel 4.9 之後被納入了。

Spotify 有大量資料要傳到使用者端 (像是音檔),剛好是 TCP BBR 改善的對象之一,實際測試後得到了很不錯的改善數據:「Smoother Streaming with BBR」。

Spotify 公佈的資料沒有提到平台,所以先稍微了解一下他的音質,也就是「Audio settings」這篇。

在 Desktop 是 160kbps/320kbps Ogg (Standard/HQ)。在 Web Player 則是 128kbps/256kbps AAC (Standard/HQ)。

行動平台部份比較複雜,在 iOS 上是 96kbps/160kbps/256kbps Ogg (Normal/High/Extreme),另外有 Automatic 自動調整的設定。在 Android 平台則是 24kbps HE-AACv2 (Low) 與 96kbps/160kbps/320kbps Ogg (Normal/High/Very high) 以及 Automatic。

而最後 Chromecast 則是 128kbps/256kbps (Standard/Premium)。

測試時可以發現 shutter (指跟不上播放速度) 的情況降低了 6%~10%,而且下載速度增加了 5%~7% (對於慢速的裝置改善更多,10%~15%):

Taking daily averages, stutter decreased 6-10% for the BBR group. Bandwidth increased by 10-15% for the slower download cohorts, and by 5-7% for the median. There was no difference in latency between groups.

而各地區的差異也可以看出來改善很多:

另外他們在測試時,剛好遇到秘魯的機房連外發生問題,結果意外發現 BBR 還是可以穩定在這種網路環境下運作:

In Peru, the non-BBR group saw a 400-500% increase in stutter. In the BBR group, stutter only increased 30-50%.

In this scenario, the BBR group had 4x bandwidth for slower downloads (the 10th percentile), 2x higher median bandwidth, and 5x less stutter!

Ubuntu 18.04 上可以直接設定 BBR,在 Ubuntu 16.04 則可以參考「Ubuntu 16.04 用 speedtest-cli 測試 TCP BBR 效能」這篇的方式升級 kernel 後設定 BBR。

Google 研發出的 BBR: Congestion-Based Congestion Control

Google 針對 TCP 的 congestion control 研究出了新的方法,是個純 sender-side 的演匴法,可以讓現有的 internet 直接換上去使用:「[net-next,14/14] tcp_bbr: add BBR congestion control」。

在 long-lived TCP connection 愈來愈普及後 (像是 HTTP/2),TCP 連線的最佳化可以用統計模型來計算,這也就是 BBR 的想法:

In a nutshell, BBR creates an explicit model of the network pipe by sequentially probing the bottleneck bandwidth and RTT. On the arrival of each ACK, BBR derives the current delivery rate of the last round trip, and feeds it through a windowed max-filter to estimate the bottleneck bandwidth. Conversely it uses a windowed min-filter to estimate the round trip propagation delay. The max-filtered bandwidth and min-filtered RTT estimates form BBR's model of the network pipe.

不過 QUIC 不是也開始有進展了嗎?(參考「Google Chrome 52 預設開啟了更快的 QUIC (被戲稱為 TCP/2)」這篇)

感覺 QUIC 解決的比較徹底,不過 443/udp 的 firewall 問題的確也是個需要時間解決的課題...