OS – Page 7 – Gea-Suan Lin's BLOG

FreeBSD 的 Amazon EC2 Image 打算自動使用本機空間當作 Swap

Twitter 上看到 Colin Percival 說計畫將 FreeBSD EC2 image (AMI) 自動偵測並使用 ephemeral disk 的空間當作 swap：

EC2 ephemeral ("instance store") disks are great for swap space, and in future FreeBSD releases they'll be used for that automatically.

Next up: Use the rest of the space for ZFS L2ARC. pic.twitter.com/Dqs1Wg1y5M

— Colin Percival (@cperciva) May 15, 2022

就算是使用 EBS 的 gp2 或是 gp3，甚至是其他 VPS，我也很習慣開一點點的 swap 空間來用 (通常是用 file swap 的方式開 512MB，無論記憶體有多大)，這算是我自己的 best practice 了，這可以把一些完全沒用到的 daemon 塞進 swap。

不過對於已經把 ephemeral disk 規劃拿來用的人可能會不太開心，需要去改設定...

Ubuntu 22.04 LTS 出版

Okay，各家 DevOps Engineers & SRE 可能又要忙碌了，兩年一度的盛事，Ubuntu 22.04 LTS 出版：「Canonical Ubuntu 22.04 LTS is released」。

其中有一些比較特別的消息，像是這次是 Ubuntu Desktop 這邊正式支援 Raspberry Pi 4 平台：

For innovators on Raspberry Pi, Ubuntu 22.04 LTS marks the first LTS release with Ubuntu Desktop support on the Raspberry Pi 4.

照慣例先放一個月再看看，通常三個月後應該會把大家常見的問題都修差不多，我這邊 Desktop 是用換成 Xfce 的 Xubuntu，要再等一下...

觀察誰在存取剪貼簿的工具 (X11 下)

兩個月前在 Hacker News 上看到的討論，有人想要知道誰在 X11 下存取剪貼簿：「Who keeps an eye on clipboard access? (ovalerio.net)」，原文在「Who keeps an eye on clipboard access?」這邊，作者用 Python 寫的程式則是在「clipboard-watcher」這邊。

馬上有想到 iOS 在 2020 年推出的機制：「iOS 14 clipboard notifications are annoying, but developer adoption of a new API will improve the experience」。

不過在 X11 上跑起來會發現冒出來的資訊量有點大，像是在瀏覽器操作 WordPress 寫文章時剪剪貼貼的時候就會狂噴，如果可以提供程式的白名單的話就更好了，畢竟是我直接把 clipboard API 裡讀取的功能直接拔掉 (但網站還是可以寫進去就是了)，對我來說不會在意 browser 寫進去的情況：

另外程式有時候會卡住 (尤其是遇到圖片的剪輯時)，算是 bug 吧...

然後 Hacker News 的討論串裡面有人提到一個有趣的設計，他希望限制那些不在焦點上面的程式去碰 clipboard：

By far the worst offense I've seen in clipboard privacy on the Linux desktop is RedHat's virt-manager. It sends your clipboard AND selection content to all virtual machines, even when they are not focused, with no indication that it's happening, and with no GUI option to turn it off. This is at odds with the common practice of running untrusted code in virtual machines.

這個想法好像不賴，理論上 clipboard 應該是在有互動的時候才會碰到的東西...

Pointer tagging

在 Hacker News 上看到「Pointer Tagging for x86 Systems (lwn.net)」這篇，在講目前的 64 bits 環境下還不可能提供整個 64 bits 可以定位的位置，所以 pointer 裡面比較高的那些位置就可以被拿來挪去其他用的想法。

先算了一下數字，如果以 8 bits 為一個單位來算，之前經典的 32 bits 定位空間是 4GB，40 bits 是 1TB，這兩個都已經有機器可以做到了 (AWS 提供的 u-12tb1.112xlarge 是 12TB)。

接下來的 48 bits 的時候可以到 256TB，這個不確定目前有沒有單一機器可以做到 (印象中 IBM 好像很喜歡幹這個？)，56 bits 則是到 64PB，最後的 64 bits 則是 16EB。

真的是沒注意到...

Linux 打算合併 /dev/random 與 /dev/urandom 遇到的問題

在 Hacker News 上看到「Problems emerge for a unified /dev/*random (lwn.net)」的，原文是「Problems emerge for a unified /dev/*random」(付費內容，但是可以透過 Hacker News 上的連結直接看)。

標題提到的兩個 device 的性質會需要一些背景知識，可以參考維基百科上面「/dev/random」這篇的說明，兩個都是 CSPRNG，主要的分別在於 /dev/urandom 通常不會 block：

The /dev/urandom device typically was never a blocking device, even if the pseudorandom number generator seed was not fully initialized with entropy since boot.

而 /dev/random 不保證不會 block，有可能會因為 entropy 不夠而卡住：

/dev/random typically blocked if there was less entropy available than requested; more recently (see below, different OS's differ) it usually blocks at startup until sufficient entropy has been gathered, then unblocks permanently.

然後順便講一下，因為這是 crypto 相關的設計修改，加上是 kernel level 的界面，安全性以及相容性都會是很在意的點，而 Hacker News 上的討論裡面很多是不太在意這些的，你會看到很多「很有趣」的想法在上面討論 XDDD

回到原來的文章，Jason A. Donenfeld (Linux kernel 裡 RNG maintainer 之一，不過近期比較知名的事情還是 WireGuard 的發明人) 最近不斷的在改善 Linux kernel 裡面這塊架構，這次打算直接拿 /dev/random 換掉 /dev/urandom：「Uniting the Linux random-number devices」。

不過換完後 Google 的 Guenter Roeck 就在抱怨在 QEMU 環境裡面炸掉了：

This patch (or a later version of it) made it into mainline and causes a large number of qemu boot test failures for various architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I observed). Common denominator is that boot hangs at "Saving random seed:". A sample bisect log is attached. Reverting this patch fixes the problem.

他透過 git bisect 找到發生問題的 commit，另外從卡住的訊息也可以大概猜到在虛擬機下 entropy 不太夠。

另外從他們三個 (加上 Linus) 在 mailing list 上面討論的訊息可以看到不少交流：「Re: [PATCH v1] random: block in /dev/urandom」，包括嘗試「餵」entropy 進 /dev/urandom 的 code...

後續看起來還會有一些嘗試，但短期內看起來應該還是會先分開...

在 Docker 裡面跑 GUI 程式的點子

昨天的 Hacker News Daily 上看到「Running GUI apps within Docker containers」這篇文章，裡面想要把程式包到 Docker container 裡面，然後給了一些想法，另外在「Running GUI apps within Docker containers (trickster.dev)」這邊也有一些討論與想法可以看。

要注意的是，這邊主要是以 X11 類的環境為主 (所以應該還是 Linux 了)，而文章是用 Firefox 當例子，不過主要應該還是會拿來跑其他的東西...

看起來 GUI 的部份主要就是先用 VNC + x11vnc 打通到 host 的 X11 環境，這邊會需要 xhost 開授權讓 container 內的程式可以控制 X11 的環境 (話說他範例裡面直接開 xhost + 也真讚)。

後面提到的 noVNC 則是把 VNC 轉到 HTML5 上面讓瀏覽器可以操作，就不是那麼感興趣。

另外在討論裡面也有人直接放大絕，把一堆權限放進去 container：

docker run -it --rm -e DISPLAY --net=host -v $XAUTHORITY:/root/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix debian:11-slim

不過整體看起來算是提供了一些思路... 算是除了 Flatpak 外的一些方法。

最近 Linux 核心安全性問題的 Dirty Pipe 故事很有趣...

在 Hacker News 上看到「The Dirty Pipe Vulnerability」這個 Linux kernel 的安全性問題，Hacker News 上相關的討論在「The Dirty Pipe Vulnerability (cm4all.com)」這邊可以看到。

這次出包的是 splice() 的問題，先講他寫出可重製 bug 的程式碼，首先是第一個程式用 user1 放著跑：

#include <unistd.h>
int main(int argc, char **argv) {
  for (;;) write(1, "AAAAA", 5);
}
// ./writer >foo

然後第二個程式也放著跑 (可以是不同的 user2，完全無法碰到 user1 的權限)：

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char **argv) {
  for (;;) {
    splice(0, 0, 1, 0, 2, 0);
    write(1, "BBBBB", 5);
  }
}
// ./splicer <foo |cat >/dev/null

理論上不會在 foo 裡面看到任何 BBBBB 的字串，但卻打穿了... 透過 git bisect 的檢查，他也確認了是在「pipe: merge anon_pipe_buf*_ops」這個 commit 時出的問題。

不過找到問題的過程拉的頗長，一開始是有 web hosting 服務的 support ticket 說 access log 下載下來發現爛掉了，無法解壓縮：

It all started a year ago with a support ticket about corrupt files. A customer complained that the access logs they downloaded could not be decompressed. And indeed, there was a corrupt log file on one of the log servers; it could be decompressed, but gzip reported a CRC error.

然後他先手動處理就把票關起來了：

I fixed the file’s CRC manually, closed the ticket, and soon forgot about the problem.

接下來過幾個月後又發生，經過幾次的 support ticket 後他手上就有一些「資料」可以看：

Months later, this happened again and yet again. Every time, the file’s contents looked correct, only the CRC at the end of the file was wrong. Now, with several corrupt files, I was able to dig deeper and found a surprising kind of corruption. A pattern emerged.

然後因為發生的頻率也不是很高，加上邏輯上卡到死胡同，所以他也沒有辦法花太多時間在上面：

None of this made sense, but new support tickets kept coming in (at a very slow rate). There was some systematic problem, but I just couldn’t get a grip on it. That gave me a lot of frustration, but I was busy with other tasks, and I kept pushing this file corruption problem to the back of my queue.

後來真的花時間下去找，利用先前的 pattern 掃了一次系統 log，發現有規律在：

External pressure brought this problem back into my consciousness. I scanned the whole hard disk for corrupt files (which took two days), hoping for more patterns to emerge. And indeed, there was a pattern:

there were 37 corrupt files within the past 3 months

they occurred on 22 unique days

18 of those days have 1 corruption

1 day has 2 corruptions (2021-11-21)

1 day has 7 corruptions (2021-11-30)

1 day has 6 corruptions (2021-12-31)

1 day has 4 corruptions (2022-01-31)

The last day of each month is clearly the one which most corruptions occur.

然後就試著寫各種 reproducible code，最後成功的版本就是開頭提到的，然後他發現這個漏洞可以是 security vulnerability，就回報出去了，可以看到前後從第一次的 support ticket 到最後解決花了快一年的時間，不過 Linux kernel 端修正的速度蠻快的：

2021-04-29: first support ticket about file corruption

2022-02-19: file corruption problem identified as Linux kernel bug, which turned out to be an exploitable vulnerability

2022-02-20: bug report, exploit and patch sent to the Linux kernel security team

2022-02-21: bug reproduced on Google Pixel 6; bug report sent to the Android Security Team

2022-02-21: patch sent to LKML (without vulnerability details) as suggested by Linus Torvalds, Willy Tarreau and Al Viro

2022-02-23: Linux stable releases with my bug fix (5.16.11, 5.15.25, 5.10.102)

2022-02-24: Google merges my bug fix into the Android kernel

2022-02-28: notified the linux-distros mailing list

2022-03-07: public disclosure

整個故事還蠻精彩的 XD

Raspberry Pi 4 將可以透過有線網路安裝系統了

在「Raspberry Pi 4 to support Network install to a blank MicroSD card」這邊看到 Raspberry Pi 4 將可以透過有線網路安裝系統了：

The Raspberry Pi 4 will soon be able to install Raspberry Pi OS without the need for external hardware to flash the image.

先前都是透過其他機器先刷好 SD card 再放進去開機，之後可以透過有線網路直接裝，讓步驟簡單一些... 另外有提到這次會支援的只有 RPi4 與 CM4 機種，先前的版本還是得透過其他機器生出可開機的 SD card：

The Raspberry Pi Foundation simply changed the bootloader code to enable the Network install feature, and yes, it will only work with Raspberry Pi 4, CM4, and Raspberry Pi 400 keyboard PC, but not Raspberry Pi 3 and earlier models.

Raspberry Pi OS 64-bit 與 32-bit 的效能差異

前幾天提過「Raspberry Pi OS 64-bit 版本正式推出」，而 Phoronix 實際拿正式版測試 64-bit 與 32-bit 的系統差異了，在「Raspberry Pi OS 32-bit vs. 64-bit Performance」這邊可以看到每一個測試項目的結果。

測試的硬體是 Raspberry Pi 400，這台機器基本上就是 4GB 版本的 Raspberry Pi 4 加上週邊配件：

Using a Raspberry Pi 400 keyboard computer with 4GB of RAM, I ran some fresh benchmarks of Raspberry Pi OS in its default 32-bit build and then again with the new 64-bit build.

先講結果，在 Phoronix 的 33 個測試裡面，64-bit 全部都比 32-bit 好，而且是很明顯的差異：

Across the few dozen different workloads tested, switching Raspberry Pi OS 11 for the 64-bit version improved the performance on average by about 48%. See all the 32-bit vs. 64-bit Raspberry Pi benchmarks over on OpenBenchmarking.org.

之前 64-bit OS 還在 beta 的時候就已經知道這個情況了，所以不會覺得太意外。當時提出的解釋是指令集的差異，aarch64 提供的指令集比 armv6 有效率多了，這點在 2016 年的文章「64-bit ARM (Aarch64) Instructions Boost Performance by 15 to 30% Compared to 32-bit ARM (Aarch32) Instructions」這邊可以看到說明。

所以正式版出來以後，只要硬體有支援，基本上都建議裝 64-bit OS 了...

Raspberry Pi OS 64-bit 版本正式推出

在 Twitter 上看到前同事貼了 Raspberry Pi 官方放出 Raspberry Pi OS 64-bit 版本的公告：「Raspberry Pi OS (64-bit)」。

我是在 beta 時就已經跑一陣子了，依照官方的說明可以看到 Raspberry Pi 3 或是 Raspberry Pi Zero 2 以上的版本才支援 64-bit OS。

我是在 Raspberry Pi 3 上面跑，主要是現在大多數支援 ARM 指令集的 package 都是包 arm64，換到 64-bit OS 能裝的東西會比較多。

另外有提到目前 64-bit 版本的 Chromium 還沒有 Widevine 支援，無法看 Netflix 或是 Disney+，需要裝 32-bit 版本才能看：

The 64-bit version of Chromium, installed by default, has no version of the WidevineCDM library and therefore, it is not possible to play streaming media such as Netflix or Disney+.

不過應該過一陣子就會有了...