FreeBSD 4 到 FreeBSD 5/6/7/8 年代的事情...

在「FreeBSD 4 Bug may be present in Playstation 4/5 (wololo.net)」這邊看到一堆考古的討論... 原文是在講 FreeBSD 4 的 bug 可能會是 PS4PS5 的 jailbreak 切入點,不過討論裡面倒是看到了很多考古...

當年 FreeBSD 4.11 在 kernel 裡還是 single thread,然後是對 multi-core architecture 的理念不合分家了,另外一派變成 Dragonfly BSD,兩邊都花了不少時間實作多核心的架構。

印象中 FreeBSD 剛推出 5.x 的時候很不穩定,然後一路到 7.x 才算整個穩定下來,查了一下維基百科上的版本日期資訊:

After almost three years of development, the first 5.0-RELEASE in January 2003 was widely anticipated, featuring support for advanced multiprocessor and application threading, and for the UltraSPARC and IA-64 platforms.

FreeBSD 7.0 was released on 27 February 2008.

是個懷舊的討論串...

FreeBSD 14.0 對比於 13.2 有顯著的效能提升

Hacker News 上看到「FreeBSD 14.0 Delivering Great Performance Uplift (phoronix.com)」這篇,原文在「FreeBSD 14.0 Is Delivering Great Performance Uplift & Running Well In Early Tests」這邊。

測試平台是 AMD EPYC™ 8534P (64 cores & 128 threads),是個今年九月才推出的 CPU,另外底層 filesystem 是跑 ZFS

翻了一輪測試的資訊,幾乎是所有的項目都有提升 (少數幾樣有些微退步),但以 Phoronix 的測試計算,整體計算起來有 18% 的提升,對於 OS 升級帶來的提升算是蠻巨大的:

Across the span of five dozen benchmarks carried out on this AMD EPYC 8534P server of FreeBSD 13.2 vs. FreeBSD 14.0, the newly-released FreeBSD 14 was on average 18% faster than its predecessor. Not bad for a simple OS upgrade. I've been seeing very healthy gains on other x86_64 servers tested so far while due to hardware availability haven't yet tried any AArch64 servers.

依照他的說明,後續會有跟其他 Linux distribution 的比較,到時候可以回來再看看:

I'll be running more FreeBSD 14.0 server benchmarks shortly along with following that up by looking at the FreeBSD 14.0 performance against the latest leading Linux distributions. In any event I'm quite happy thus far with the performance and experience in my FreeBSD 14.0 testing.

回頭看報告裡面比較特別的部分,一個是 OpenSSL 的部分有下滑一些,這點應該跟版本更新有關,在 FreeBSD 14.0 的 Release Notes 裡面有提到大版本升級:

OpenSSL has been upgraded to version 3.0.12. This is a major upgrade from version 1.1.1, which has reached its end of life. Many components of the base system use a backward-compatible API, but will be migrated later. aa7957345732 930cec16d9ee b077aed33b7b (Sponsored by The FreeBSD Foundation)

另外一點是在 Page 4 裡面,可以看到 PostgreSQL 16 的效能提升非常明顯,無論是 TPS 還是 latency 都有非常巨大的改善。

FreeBSD 14.0 釋出

FreeBSD 14.0-RELEASE 的公告也出來了:「FreeBSD 14.0-RELEASE Announcement」,比較完整的 release notes 在「FreeBSD 14.0-RELEASE Release Notes」。

先從官方列的 highlight 來看,首先比較重要的是 GENERIC kernel 支援 1024 cores:

FreeBSD supports up to 1024 cores on the amd64 and arm64 platforms.

看了一下 commit log 是從 256 變成 1024

先就 x86-64 這邊來看,目前「家用」最多的應該是 AMD7995WX (96 cores),舊版的 256 限制應該也還能撐住,但看 commit log 有提到,主要是預期這幾年應該會有更暴力的機器出現。

另外一塊是伺服器端,Intel 這邊有 8 sockets 的版本 (參考「Intel Xeon Sapphire Rapids to Scale to 4 and 8 Sockets」),如果都是接 8490H 的話就是 480 cores 了。

ARM 的話好像也可以堆,但不熟...

另外一個提到的重點是 TCP 預設的 congestion control 改成 CUBIC

The default congestion control mechanism for TCP is now CUBIC.

翻 commit log 可以看到是從 NewReno 換成 CUBIC 的,這樣就跟 Linux kernel 預設值一樣了。

再來比較重要的是在 release notes 裡面提到的,FreeBSD 15.0 將會拔光 32-bit 環境的支援,只留 armv7,這代表 Raspberry Pi 第一代的 armv6 也被淘汰掉了:

FreeBSD 15.0 is not expected to include support for 32-bit platforms other than armv7. The armv6, i386, and powerpc platforms are deprecated and will be removed. 64-bit systems will still be able to run older 32-bit binaries.

然後有些我自己翻覺得還蠻有趣的。

首先是看到 non-root 的 chroot

The chroot facility supports unprivileged operation, and the chroot(8) program has a -n option to enable its use. a40cf4175c90 (Sponsored by EPSRC)

然後把 OpenSSH 內對 FIDO/U2F 的支援開起來了:

The use of FIDO/U2F hardware authenticators has been enabled in ssh, using the new public key types ecdsa-sk and ed25519-sk, along with corresponding certificate types. FIDO/U2F support is described in https://www.openssh.com/txt/release-8.2. e9a994639b2a (Sponsored by The FreeBSD Foundation)

ASLR 預設開啟:

Address Space Layout Randomization (ASLR) is enabled for 64-bit executables by default. It can be disabled as needed if applications fail unexpectedly, for example with segmentation faults. To disable for a single invocation, use the proccontrol(1) command: proccontrol -m aslr -s disable command. To disable ASLR for all invocations of a binary, use the elfctl(1) command: elfctl -e +noaslr file. Problems should be reported via the problem reporting system, https://bugs.freebsd.org, or posting to the freebsd-stable@FreeBSD.org mailing list. b014e0f15bc7 (Sponsored by Stormshield)

然後先前被罵臭頭的 WireGuard 支援也放回來了:(「FreeBSD & pfSense 上的 WireGuard 問題」)

The kernel wg(4) WireGuard driver has been reintegrated; it provides Virtual Private Network (VPN) interfaces using the WireGuard protocol. 744bfb213144 (Sponsored by Rubicon Communications, LLC ("Netgate") and The FreeBSD Foundation)

然後看到 Netflix 贊助的 kTLS 支援 TLS 1.3:

KTLS (the kernel TLS implementation) has added receive offload support for TLS 1.3. Receive offload is now supported for TLS 1.1 through 1.3; send offload is supported for TLS 1.0 through 1.3. 05a1d0f5d7ac (Sponsored by Netflix)

然後 FreeBSD 長久以來 root 預設用的 /bin/csh 改成 /bin/sh 了:

The default shell for the root user is now sh(1), which has many new features for interactive use. d410b585b6f0

預設的 MTA 變成 dma (Dragonfly Mail Agent),看名字加上翻了一下 manpage,確認是從 Dragonfly BSD 移植過來的:

The default mail transport agent (MTA) is now the Dragonfly Mail Agent (dma(8)) rather than sendmail(8). Configuration of the MTA is done via mailer.conf(5). sendmail(8) and its configuration remain available. a67b925ff3e5

然後 portsnap 被拔掉了,現在就建議直接用 git 拉了,算是功成身退了:

The portsnap(8) utility has been removed. Users are encouraged to fetch the ports tree by using pkg install git and then git clone https://git.FreeBSD.org/ports.git /usr/ports. df53ae0fdd98

而 mergemaster 也被換成 etcupdate 了:

mergemaster(8) has been deprecated. Its replacement is etcupdate(8). 398b12691b4f (Sponsored by The FreeBSD Foundation)

然後支援 tarfs,而且可以用 zstd

The tarfs(5) file system has been added, which is backed by POSIX tar archives optionally compressed with zstd(1). 69d94f4c7608 (Sponsored by Juniper Networks, Inc.) (Sponsored by Klara, Inc.)

好久沒看 FreeBSD 的 release notes...

跑 ldd 有可能會執行裡面的程式碼

Daily Lobsters 上看到「ldd(1) and untrusted binaries」這篇,這次的重點在 ldd 的 manpage ldd(1) 裡提到可能會執行裡面的程式碼,所以不適合拿來處理 untrusted binary:

Be aware, however, that in some circumstances, some versions of ldd may attempt to obtain the dependency information by directly executing the program. Thus, you should never employ ldd on an untrusted executable, since this may result in the execution of arbitrary code.

另外在原文裡面的 comment 有人提到 macOS 上面沒有 ldd,而是用其他工具給出類似的資訊,看起來是避開了這種實作方式:

macOS and other Darwin-based systems, which use Mach-O rather than ELF, and have an 4.x/SVR4-inspired dynamic linking mechanism (not surprising, given that the person who did a lot of the work on the 4.x system left Sun to go to NeXT), but don't have an "ldd" program. Instead, there's "otool -L", which produces output such as [...]

FreeBSD 上的 ldd(1) manpage 上沒有提到安全問題,但從他的實作描述看起來也不太妙:

ldd lists the dependencies of an executable by setting rtld(1) environment variables and running the executable in a child process.

回到原來主題,Linux manpage 裡面提到的 objdump 跟 ldd 的功能還是差蠻多的啊?不知道合理的替代品到底是什麼...

Netflix 單機 800Gbps 伺服器所使用的最佳化技巧

Hacker News 上看到 Netflix 的人丟出來的投影片,試著了解 Netflix 的 Open Connect Appliances 裡與 FreeBSD 相關的最佳化技巧對於效能的影響:「The “other” FreeBSD optimizations used by Netflix to serve video at 800Gb/s from a single server」。

看起來這邊的分析是先基於 400Gbps 的版本,可以跑到 375Gbps (53% CPU),接著在上面拔掉各種最佳化的設定,看看會掉多少流量。這邊可以參考先前在「Netflix 在單機服務 400Gbps 的影音流量」提到的資料。

投影片上的第一章是 sendfile 與 kTLS 相關的最佳化,這邊可以看出來都是重要的項目,隨便關掉一個就會掉很多 capacity:

  • Disable kTLS (and async sendfile) + nginx aio:40Gbps (100% CPU)
  • Disable kTLS (and async sendfile) + nginx thread pools:90Gbps (90% CPU)
  • Disable sendfile (but use kTLS):75Gbps (80% CPU)
  • Disable sendfile (but use NIC kTLS):95Gbps (80% CPU)
  • Enable Sendfile & kTLS, but disable ISA-L crypto:180Gbps (80% CPU)
  • Enable Sendfile & kTLS:240Gbps (80% CPU)

第二章是 virtual memory,UMA VM Page Cache 這邊看起來最明顯,SF_NOCACHE 也是個重要的項目:

  • Disable UMA VM Page Cache:60Gbps (95% CPU)
  • Disable VM Batch Queues:280Gbps (95% CPU)
  • Disable SF_NOCACHE:120Gbps (55% CPU)

另外第二章特別提到了一個之前沒有用到的 optimization,是把 arm64 上面的 4KB Pages 變成 16KB Pages,這帶動了些許的效能提昇,並且降低了 CPU 使用率:

345Gb/s @ 80% CPU -> 368Gb/s @ 66% CPU

第三章是 network stack,看起來 TSO 帶來的效益也是很高:

  • Disable TCP Large Receive Offload:330Gbps (65% CPU)
  • Disable RSS accelerated LRO:365Gbps (70% CPU)
  • TSO Disabled:180Gbps (85% CPU)
  • Disable TSO and LRO:170Gbps (85% CPU)

最後面則是有提到從 400Gbps 到 800Gbps 還多做了那些事情,最後是達到 731Gbps。

用的機器是 Dell PowerEdge R7525,這是一台 2U 的機器啊...

FreeBSD 的 Firecracker 支援

Colin PercivalFreeBSD 能夠支援 Firecrack:「Announcing the FreeBSD/Firecracker platform」,成為 LinuxOSv 以外的第三個支援的作業系統。對應的 patch 在「amd64: Add FIRECRACKER kernel configuration」這邊可以看到。

接下來是反過來,要送一些 patch 進去 Firecracker,讓他支援 FreeBSD:

Now that FreeBSD supported Firecracker, there was one more thing to do: Make Firecracker support FreeBSD.

看起來是基於之前在 2020 年的 patch (但當時沒被整進去) 再修改:

Alejandro Jimenez contributed patches two years ago, but they were never merged. Some of his code ended up in the linux-loader project (which Firecracker uses); but I spent a few weeks digging through his thousand lines of changes to figure out which went into linux-loader, which still applied cleanly to Firecracker, and which I had to rewrite from scratch — a task made more difficult by the fact that Firecracker is written in Rust, and I had never used Rust before! Nevertheless, I was eventually successful, and opened a PR with updated patches which I hope to see merged into mainline Firecracker in the upcoming weeks.

看起來兩邊都有 patch 要做才能支援,目前看起來 Firecracker 這邊沒動作了,大概是沒什麼動力...

FreeBSD 的 Amazon EC2 Image 打算自動使用本機空間當作 Swap

Twitter 上看到 Colin Percival 說計畫將 FreeBSD EC2 image (AMI) 自動偵測並使用 ephemeral disk 的空間當作 swap:

就算是使用 EBSgp2 或是 gp3,甚至是其他 VPS,我也很習慣開一點點的 swap 空間來用 (通常是用 file swap 的方式開 512MB,無論記憶體有多大),這算是我自己的 best practice 了,這可以把一些完全沒用到的 daemon 塞進 swap。

不過對於已經把 ephemeral disk 規劃拿來用的人可能會不太開心,需要去改設定...

FreeBSD 把 root 預設的 shell 改成 /bin/sh

Hacker News 首頁上看到 FreeBSD 把 root 的 shell 從 /bin/csh 換成 /bin/sh 了:「sh(1): make it the default shell for the root user」,Hacker News 上的討論也可以看一看:「FreeBSD switches the default root shell from csh to sh (freebsd.org)」。

依照說明是把 FreeBSD 的 /bin/sh 加上了不少東西,所以算是堪用了:

In the recent history sh(1) has gain the missing features for it to
become a usable interractive shell:
- command completion
- persistent history support
- improvements on the default bindings in emacs mode
- improvements in the vi mode (repect $EDITOR)
- print a newline when exiting via ^D
- default prompt and improvements on how PS1 can be configured
- and more.

This changes also simplifies making tiny freebsd images with only sh(1)
as a shell

以前也是用 csh 系列的 (像是 tcsh),後來開始用 Linux 就往 BashZsh 定居了...

Netflix 在單機服務 400Gbps 的影音流量

Hacker News 首頁上看到 NetflixEuroBSDCon 2021 上發表的投影片:「Serving Netflix Video at 400Gb/s on FreeBSD」,對應的討論則是在「Serving Netflix Video at 400Gb/s [pdf] (freebsd.org)」這邊可以翻到,投影片的作者有在上面回答一些問題。

投影片在講的應該就是 Netflix 的 Open Connect

主要是因為 Open Connect 的伺服器是放到各家 ISP 機房,在單一 IP 且單一伺服器的限制下,要想辦法壓榨出最高的效能。

硬體是 AMDEPYC,在先前的版本可以達到 240Gbps,經過分析與嘗試解決了一堆問題後,最後是在原來的 AMD 機器上跑到了 380Gbps (另外有測 ARM 以及 Intel 的數字),然後之後機房有可能會有 800Gbps 的標準,他們又要繼續煩惱...

有看到 Mellanox ConnectX-6 Dx (CX6-DX) 這個東西,看起來很有趣啊,有 200Gbps 的能力,而且可以把 TLS 的事情推到卡上面處理... 然後這家公司被 Nvidia 買走了。

另外當然也會有人問為什麼不用 Linux,作者在討論串裡面也有回答一些,有興趣的可以自己去搜一下。

FreeBSD & pfSense 上的 WireGuard 問題

FreeBSDpfSense 上的 WireGuard 實做問題前幾天被 Ars 的人整理出來:「Buffer overruns, license violations, and bad code: FreeBSD 13’s close call」,文章有點長度,但整個劇情有種在看八點檔的感覺... 在 Hacker News 上的「Buffer overruns, license violations, and bad code: FreeBSD 13’s close call (arstechnica.com)」與後續的「In-kernel WireGuard is on its way to FreeBSD and the pfSense router (arstechnica.com)」都值得翻翻。

目前 FreeBSD 與 pfSense 都已經把現有的 WireGuard 實做拿掉,主要原因是實做本身的品質很差 (以及安全性問題),而且沒有將 WireGuard 的功能實做完整。

文章裡面有提到兩個組織面上的問題 (還有攻擊個人的部份,這邊就不提了),第一個是 FreeBSD 的 review process,看起來比較像是系統面的問題,目前還是缺乏人力。

另外一個是 Netgate 本身 (pfSense 後面的公司),本來沒有太多背景知識,但這次事件發生後跑去翻了一下資料,發現原來之前就有一些「記錄」了,像是註冊競爭對手的產品 OPNsense 沒有註冊的網域 opnsense.com,然後導去 pfSense 家的事情,最後使得 Deciso (OPNsense 後面的公司) 到 WIPO 上搶回來:「WIPO Domain Name Decision: D2017-1828」。

可以花些時間來看看替代方案...