FreeBSD 14.0 釋出

FreeBSD 14.0-RELEASE 的公告也出來了:「FreeBSD 14.0-RELEASE Announcement」,比較完整的 release notes 在「FreeBSD 14.0-RELEASE Release Notes」。

先從官方列的 highlight 來看,首先比較重要的是 GENERIC kernel 支援 1024 cores:

FreeBSD supports up to 1024 cores on the amd64 and arm64 platforms.

看了一下 commit log 是從 256 變成 1024

先就 x86-64 這邊來看,目前「家用」最多的應該是 AMD7995WX (96 cores),舊版的 256 限制應該也還能撐住,但看 commit log 有提到,主要是預期這幾年應該會有更暴力的機器出現。

另外一塊是伺服器端,Intel 這邊有 8 sockets 的版本 (參考「Intel Xeon Sapphire Rapids to Scale to 4 and 8 Sockets」),如果都是接 8490H 的話就是 480 cores 了。

ARM 的話好像也可以堆,但不熟...

另外一個提到的重點是 TCP 預設的 congestion control 改成 CUBIC

The default congestion control mechanism for TCP is now CUBIC.

翻 commit log 可以看到是從 NewReno 換成 CUBIC 的,這樣就跟 Linux kernel 預設值一樣了。

再來比較重要的是在 release notes 裡面提到的,FreeBSD 15.0 將會拔光 32-bit 環境的支援,只留 armv7,這代表 Raspberry Pi 第一代的 armv6 也被淘汰掉了:

FreeBSD 15.0 is not expected to include support for 32-bit platforms other than armv7. The armv6, i386, and powerpc platforms are deprecated and will be removed. 64-bit systems will still be able to run older 32-bit binaries.

然後有些我自己翻覺得還蠻有趣的。

首先是看到 non-root 的 chroot

The chroot facility supports unprivileged operation, and the chroot(8) program has a -n option to enable its use. a40cf4175c90 (Sponsored by EPSRC)

然後把 OpenSSH 內對 FIDO/U2F 的支援開起來了:

The use of FIDO/U2F hardware authenticators has been enabled in ssh, using the new public key types ecdsa-sk and ed25519-sk, along with corresponding certificate types. FIDO/U2F support is described in https://www.openssh.com/txt/release-8.2. e9a994639b2a (Sponsored by The FreeBSD Foundation)

ASLR 預設開啟:

Address Space Layout Randomization (ASLR) is enabled for 64-bit executables by default. It can be disabled as needed if applications fail unexpectedly, for example with segmentation faults. To disable for a single invocation, use the proccontrol(1) command: proccontrol -m aslr -s disable command. To disable ASLR for all invocations of a binary, use the elfctl(1) command: elfctl -e +noaslr file. Problems should be reported via the problem reporting system, https://bugs.freebsd.org, or posting to the freebsd-stable@FreeBSD.org mailing list. b014e0f15bc7 (Sponsored by Stormshield)

然後先前被罵臭頭的 WireGuard 支援也放回來了:(「FreeBSD & pfSense 上的 WireGuard 問題」)

The kernel wg(4) WireGuard driver has been reintegrated; it provides Virtual Private Network (VPN) interfaces using the WireGuard protocol. 744bfb213144 (Sponsored by Rubicon Communications, LLC ("Netgate") and The FreeBSD Foundation)

然後看到 Netflix 贊助的 kTLS 支援 TLS 1.3:

KTLS (the kernel TLS implementation) has added receive offload support for TLS 1.3. Receive offload is now supported for TLS 1.1 through 1.3; send offload is supported for TLS 1.0 through 1.3. 05a1d0f5d7ac (Sponsored by Netflix)

然後 FreeBSD 長久以來 root 預設用的 /bin/csh 改成 /bin/sh 了:

The default shell for the root user is now sh(1), which has many new features for interactive use. d410b585b6f0

預設的 MTA 變成 dma (Dragonfly Mail Agent),看名字加上翻了一下 manpage,確認是從 Dragonfly BSD 移植過來的:

The default mail transport agent (MTA) is now the Dragonfly Mail Agent (dma(8)) rather than sendmail(8). Configuration of the MTA is done via mailer.conf(5). sendmail(8) and its configuration remain available. a67b925ff3e5

然後 portsnap 被拔掉了,現在就建議直接用 git 拉了,算是功成身退了:

The portsnap(8) utility has been removed. Users are encouraged to fetch the ports tree by using pkg install git and then git clone https://git.FreeBSD.org/ports.git /usr/ports. df53ae0fdd98

而 mergemaster 也被換成 etcupdate 了:

mergemaster(8) has been deprecated. Its replacement is etcupdate(8). 398b12691b4f (Sponsored by The FreeBSD Foundation)

然後支援 tarfs,而且可以用 zstd

The tarfs(5) file system has been added, which is backed by POSIX tar archives optionally compressed with zstd(1). 69d94f4c7608 (Sponsored by Juniper Networks, Inc.) (Sponsored by Klara, Inc.)

好久沒看 FreeBSD 的 release notes...

Firefox 宣布從 Mercurial 換到 Git

Firefox 宣佈從 Mercurial 換到 Git:「Firefox Development Is Moving From Mercurial To Git」。

目前是 Mercurial 與 Git 都支援,理由是不想要維持兩套:

For a long time Firefox Desktop development has supported both Mercurial and
Git users. This dual SCM requirement places a significant burden on teams which
are already stretched thin in parts. We have made the decision to move Firefox
development to Git.

不過不知道決策的過程到底是怎麼產生的,算是 Mozilla 的老問題了...

Cloudflare 前幾天 API 與 Dashboard 出事的 Post Mortem 記錄

前幾天 Cloudflare 的 API 與 dashboard 掛了一天多,少見的讓 Matthew Prince (CEO) 自己出來發 post mortem 記錄了:「Post Mortem on Cloudflare Control Plane and Analytics Outage」,在 Hacker News 上面也有蠻多討論的:「Post Mortem on Cloudflare Control Plane and Analytics Outage (cloudflare.com)」,這邊是整理我自己讀完後的感想。

從「Matthew Prince - The Cloudflare Blog」這邊可以看出來 Matthew Prince 上次是 2023/09/27 的公關文「Cloudflare’s 2023 Annual Founders’ Letter」,還有對應的多國翻譯,像是繁體中文的「Cloudflare 2023 年度創始人來信」,再往前的 2022/12/11 也是公關文「Welcome to Cloudflare’s Impact Week」(以及對應的繁體中文版本:「歡迎來到 Cloudflare 的 Impact Week」)。

這次的事情算是 Cloudflare 在 post-IPO 後很少見的長時間出事,就難得看到 Matthew Prince 自己出來坦了。為了重新建立信任,加上因為層級的關係,可以看到透漏出很多架構細節,算是這次可以窺視 Cloudflare 架構的一些資訊。

先大概提一下官方文章的著墨點:他們花了非常多的篇幅在機房服務商 Flexential 在處理 PDX-04 (這是 Cloudflare 訂的名稱) 機房電力問題的失職 (要注意這邊是 Cloudflare 的觀點,認為 Flexential 的失職,目前沒有從 Flexential 這邊的消息出來解釋),淡化掉了 Cloudflare 自己的設計問題,這邊在 Hacker News 上有蠻多人都有指出來的。

這次事件一切的起因是 Flexential 的 PDX-04 機房整個電力系統斷線 offline 導致的,屬於標準的 data center failure 的情況,像是 2013 年二月時是方機房的火災 (可以參考 iThome 的整理),或是 2021 年 OVH 的機房火災 (「去年 OVH 機房大火的部份情形最近被揭露」),是個在設計架構時一定會規劃進去的項目。

所以 Matthew Prince 先是解釋 Cloudflare 的 HA 作法,是直接在 Hillsboro, Oregon 租三個機房建立起 low-latency network:

Cloudflare's control plane and analytics systems run primarily on servers in three data centers around Hillsboro, Oregon. The three data centers are independent of one another, each have multiple utility power feeds, and each have multiple redundant and independent network connections.

但大家看到這計馬上就會去查,這個城市也才 66.9km2 的土地,大約是 1/4 個台北市 (約 291.8km2) 再小一些,拉了一下城市內的直線最遠距離,大約是 12km?

呃,這不是一個地震 (就在聖安地列斯斷層區域?) 或是一個核攻擊就把 Cloudflare 最核心的部分給擺平了嗎?

其中 analytics systems 就算了:整個 Hillsboro 掛了進入 gracefully degrading,我可以理解這個設計的考量,但 control plane 看起來不太妙?

雞蛋放在同一個籃子的問題裡先放著,雖然這個問題真的很...。

後續提到了有些重要的產品對沒有 HA 能力的服務上有相依性:

Unfortunately, we discovered that a subset of services that were supposed to be on the high availability cluster had dependencies on services exclusively running in PDX-04.

這邊沒有講是哪些服務的相依性,但文章其他地方有提到有些基礎服務是沒有跨機房 HA 架構的,只有在 PDX-04 有跑,包括了 KafkaClickHouse

In particular, two critical services that process logs and power our analytics — Kafka and ClickHouse — were only available in PDX-04 but had services that depended on them that were running in the high availability cluster.

這點讓人頗意外的,Kafka 因為自己架設 & 維護過,知道他的架構本身就很容易設計到跨機房的 case,而且這算是很基礎建設的東西,居然沒有跨機房 HA?

而 ClickHouse 只有研究過,沒有實際把 production 量丟上去跑,但從文件看到的東西,應該至少能做到 shared-everything 的架構,也居然沒有跨機房 HA?

這接基礎建設的問題,導致了雖然只是單一機房 PDX-04 掛掉,但在有重要基礎建設消失的情況下 (應該就是上面提到的 Kafka 與 ClickHouse),加上 Flexential 沒有給出恢復的時間,決定直接跑災難重建的 SOP (也就是 Hillsboro 的三個機房都回不來的情境)。

而這也可以看到恢復時間比較久,從決定切到歐洲的 DR site 到整個切過去花了四個多小時:

Because more services were offline than we expected, and because Flexential could not give us a time for restoration of our services, we made the call at 13:40 UTC to fail over to Cloudflare's disaster recovery sites located in Europe.

By 17:57 UTC, the services that had been successfully moved to the disaster recovery site were stable and most customers were no longer directly impacted.

因為 Kafka 與 ClickHouse 在 Hillsboro 只有單一機房有服務,那就不確定歐洲 DR site 平常有沒有建起來,也許這邊的四個多小時有不少是在歐洲 DR site 把 Kafka 與 ClickHouse 建起來?(這個就只能猜測了)

回到 Flexential 這邊,在恢復供電的過程發現 Cloudflare 這邊迴路用的 breaker 掛了,直到十個小時後才供電,但也因為大家都忙了一整天,Matthew Prince 決定讓大家先回去休息,隔天早上再從歐洲的 DR site 切回 Hillsboro,也因此拉長了恢復的時間:

At 12:48 UTC, Flexential was able to get the generators restarted. [...] When Flexential attempted to power back up Cloudflare's circuits, the circuit breakers were discovered to be faulty.

Flexential replaced our failed circuit breakers, restored both utility feeds, and confirmed clean power at 22:48 UTC. Our team was all-hands-on-deck and had worked all day on the emergency, so I made the call that most of us should get some rest and start the move back to PDX-04 in the morning. That decision delayed our full recovery, but I believe made it less likely that we’d compound this situation with additional mistakes.

要注意報告慣例是用 UTC 時間 (這又是另外一個主題了,先前 HN 上也有其他文章討論過...),而 Hillsboro 在美西,要減八個小時,所以 22:48 UTC 是下午兩點多左右,美東與歐洲的團隊時間則會更晚。

目前看起來 Cloudflare 的設計與流程有很大的改善空間?之後看看有沒有其他的八卦消息出來?

TCP 標準被整理到 RFC 9293

看到「RFC 9293: Transmission Control Protocol (TCP)」這篇,主要是把本來分散在各個 RFC 的文件 (從 RFC 793 開始) 全部整理成一份,另外把一些已知的勘誤表放進來:

This document specifies the Transmission Control Protocol (TCP). TCP is an important transport-layer protocol in the Internet protocol stack, and it has continuously evolved over decades of use and growth of the Internet. Over this time, a number of changes have been made to TCP as it was specified in RFC 793, though these have only been documented in a piecemeal fashion. This document collects and brings those changes together with the protocol specification from RFC 793. This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs 1011 and 1122, and it should be considered as a replacement for the portions of those documents dealing with TCP requirements. It also updates RFC 5961 by adding a small clarification in reset handling while in the SYN-RECEIVED state. The TCP header control bits from RFC 793 have also been updated based on RFC 3168.

然後淘汰掉 (obselete) 一卡車 RFC 文件 XD

翻資料發現 2014 的時候 HTTP/1.1 被幹過一次類似的事情,不過是反過來被拆開:「HTTP/1.1 的更新」,這次把 RFC 2616 幹掉分成 RFC 7230RFC 7235

然後今年因為 HTTP 的關係又被幹了一次,這次 HTTP/1.1 又被整回來變成一份文件,但是把裡面的一些概念拆開:「HTTP 標準的翻新」。

其中 RFC 9110 定義 HTTP Semantics,RFC 9111 定義 HTTP Caching,然後 RFC 9112RFC 9113 拿來定義了 HTTP/1.1 與 HTTP/2,另外先把 HTTP/3 的號碼保留下來的 RFC 9114

不斷 refactor 以及加新功能的文件...

Git 的「災難處理」

但印象中之前看過 (在 Internet Archive 上可以看到 4 Sep 2017 的版本),但搜尋 Hacker News 後發現沒有提過... 這幾天紅起來的「Dangit, Git!?!」,也有簡體中文版可以看。

裡面其實提到了很多要怎麼處理不小心塞錯資料進 Git 的情況,不過好像還是有些東西沒涵蓋到,像是遇到不小心塞到 credentials 進去後需要清除掉的 git rebase -i HASH,接著一連串的手動修 conflict 與 git rebase --continue,最後再接上 git push --force 這種禁招...

另外推一下「為你自己學 Git」這本書,裡面其實也有提到類似的情境:

第7章:修改歷史紀錄
7.1 狀況題 修改歷史訊息
7.2 狀況題 把多個 Commit 合併成一個Commit
7.3 狀況題 把一個 Commit 拆解成多個Commit
7.4 狀況題 想要在某些 Commit 之間再加新的Commit
7.5 狀況題 想要刪除某幾個 Commit 或是調整Commit 的順序
7.6 Reset、Revert 跟 Rebase 指令有什麼差別?

這本書也有網頁版,在 gitbook.tw 這邊。

Dolt,本機開發測試用的 MySQL server

看到「Dolt is Git for Data!」這個專案,是個在本機上跑的 MySQL server,另外可以在上面的資料進行版本控制,看起來很適合本機開發測試。

首先抓下來可以看到沒幾個檔案 (這是 linux-amd64 版),也可以看到跟 Git 的關係:

$ tree
.
├── bin
│   ├── dolt
│   ├── git-dolt
│   └── git-dolt-smudge
└── LICENSES

然後用 bin/dolt sql-server -P 3307 -u root -p passw0rd 跑就可以把一個相容於 MySQL 的伺服器跑在 port 3307,然後用 mysql -h 127.0.0.1 --port 3307 -u root -p 就可以輸入密碼 passw0rd 登入進去:

$ mysql -h 127.0.0.1 --port 3307 -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.7.9-Vitess

可以從 Server version 看到專案是用了 Vitess 實做的 MySQL 界面。

另外測了一下,透過連線所做的變更 (像是 CREATE DATABASECREATE TABLE,以及 CRUD 中的 CUD) 是不會寫回磁碟裡的,嘗試了不同的設定,不管改什麼都是這樣,應該是故意設計成這樣。

在本機跑 test case 測試應該還不錯,會比 SQLite:memory: 更接近 MySQL 一些,不過在 CI 裡的話應該是可以直接把 MySQL 跑起來...

FreeBSD 要從 Subversion 換到 Git

#bsdchat 上面看到 FreeBSD 提供了 Git repository,翻了一下看起來是最近在切換,這邊有翻到慣例的 HEADS UP:「HEADS UP: FreeBSD changing from Subversion to Git this weekend」。

The FreeBSD project will be moving it's source repo from subversion to git starting this this weekend. The docs repo was moved 2 weeks ago. The ports repo will move at the end of March, 2021 due to timing issues.

大概是 2008 年先把 src tree 從 CVS 換到 Subversion 上:「FreeBSD src 部份由 CVS 轉換到 Subversion」。

然後 2012 年把 ports tree 換過去:「FreeBSD ports 將從 CVS 轉移到 Subversion 上...」。

雖然已經很久沒用 FreeBSD 了 (最近碰到最接近的系統應該是 pfSense),但還是先恭喜他們總算要切換了,兩邊的能量差太多了...

TCP Congestion Control Algorithm 的選擇

先前 Ubuntu 桌機用 BBR 跑了一陣子,但有遇到一些問題 (可以參考「Dropbox 測試 BBRv2 的結果」這篇),所以暫時換成 Westwood,但還是陸陸續續會看一下各種研究。

剛剛在「[tor-relays] TCP CCA for Tor Relays (and especially Bridges)」這邊看到一個經驗談:

Here are my completely unscientific scribbles of how all the various algorithms behaved. The scenario is uploading for a minute or so, observing the speed in MB/sec visually, then recording how it appeared to change during that minute (and then repeating this a couple of times to be certain).

tcp_bic.ko       -- 6...5...4
tcp_highspeed.ko -- 2
tcp_htcp.ko      -- 1.5...3...2
tcp_hybla.ko     -- 3...2...1
tcp_illinois.ko  -- 6...7...10
tcp_lp.ko        -- 2...1
tcp_scalable.ko  -- 5...4...3
tcp_vegas.ko     -- 2.5
tcp_veno.ko      -- 2.5
tcp_westwood.ko  -- <1
tcp_yeah.ko      -- 2...5...6

上面是「目視法」觀察到的速度 (MB/sec),看了一下維基百科上 TCP-Illinois 的說明,看起來設計的目的是提供給頻寬大、latency 高的情境下:

It is especially targeted at high-speed, long-distance networks.

來跑跑看好了...

Bitbucket 放棄 Mercurial

Bitbucket 放棄對 Mercurial 的支援:「Sunsetting Mercurial support in Bitbucket」。

兩個時間點,一個是明年二月不能再新增,另外一個是明年六月完全停用:

February 1, 2020: users will no longer be able to create new Mercurial repositories
June 1, 2020: users will not be able to use Mercurial features in Bitbucket or via its API and all Mercurial repositories will be removed.

在 Mercurial 網站上的 wiki 也更新了:「Mercurial Hosting」,對於不想要搬到 Git 的人可以在這份列表裡找替代方案。