Ubuntu 的 Phased Update

Ubuntu 22.04 上面常常會遇到跑 apt upgrade 時系統跟你說有些 package 不打算升級:

$ sudo apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
  python3-distupgrade ubuntu-release-upgrader-core ubuntu-release-upgrader-gtk
0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.

以往遇到這種情況,如果確定要裝就是開 dist-upgrade 下去,但會發現也還是不為所動:

$ sudo apt full-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
  python3-distupgrade ubuntu-release-upgrader-core ubuntu-release-upgrader-gtk
0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.

這個可能是遇到 PhasedUpdates,這個設計是在最後一關推出去的階段,一次不要更新 100% 的機器。

可以從 apt policy 看到現在的比率是 20%:(這是我已經升級上去的樣子)

$ apt policy python3-distupgrade
python3-distupgrade:
  Installed: 1:22.04.18
  Candidate: 1:22.04.18
  Version table:
 *** 1:22.04.18 500 (phased 20%)
        500 http://tw.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://tw.archive.ubuntu.com/ubuntu jammy-updates/main i386 Packages
        100 /var/lib/dpkg/status
     1:22.04.10 500
        500 http://tw.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
        500 http://tw.archive.ubuntu.com/ubuntu jammy/main i386 Packages

而對於 LTS 的使用者,這個功能在 APT 的支援是從 Ubuntu 22.04 開始,以前只有桌面的 Update Manager 才有支援,所以不太會遇到:

Up to Focal (20.04), Update Manager is the only package manager that supports phased updates (reference). Any other update mechanism installs all updates regardless of the Phased-Update-Percentage.

(話說 wiki 頁上面可以看到有「User stories」這段,用 User story 的格式把這個功能的目的描述出來了)

所以一般人可以忽略掉,而對於有意願想要幫忙測試的人,也可以透過設定蓋過 Phased Updates 的比例設定...

透過無線 WiFi 的監視器的問題

在「Wi-Fi jamming to knock out cameras suspected in nine Minnesota burglaries (tomshardware.com)」這邊看到的,原報導在「Wi-Fi jamming to knock out cameras suspected in nine Minnesota burglaries -- smart security systems vulnerable as tech becomes cheaper and easier to acquire」。

這個手法是把無線網路的頻寬塞爆 (或是放出干擾),這樣 security camera 就算拍到東西也不會有記錄:

這個應該是已知的問題?從以前 security camera 都是建議走有線 (以及 isolated VLAN),無線的主要還是娛樂用途,而不是 security camera...

以前還有印象有看過防 EMP 的型號,不過那個是軍用...

nginx 分家:freenginx

Hacker News 上看到 Maxim Dounin 決定分家到 freenginx 的消息:「Freenginx: Core Nginx developer announces fork (nginx.org)」,原文在 mailing list 上:「announcing freenginx.org」,這邊提到分家的原因:

Unfortunately, some new non-technical management at F5 recently decided that they know better how to run open source projects. In particular, they decided to interfere with security policy nginx uses for years, ignoring both the policy and developers’ position.

在 freenginx 的 mailing list 上有提到更多,在 2024-February/000007.html 這篇:

The most recent "security advisory" was released despite the fact that the particular bug in the experimental HTTP/3 code is expected to be fixed as a normal bug as per the existing security policy, and all the developers, including me, agree on this.

And, while the particular action isn't exactly very bad, the approach in general is quite problematic.

這邊提到的 security advisory 是「[nginx-announce] nginx security advisory (CVE-2024-24989, CVE-2024-24990)」這個,看起來是個沒有 enabled by default 的功能:

Two security issues were identified in nginx HTTP/3 implementation,
which might allow an attacker that uses a specially crafted QUIC session
to cause a worker process crash (CVE-2024-24989, CVE-2024-24990) or
might have potential other impact (CVE-2024-24990).

The issues affect nginx compiled with the ngx_http_v3_module (not
compiled by default) if the "quic" option of the "listen" directive
is used in a configuration file.

The issue affects nginx 1.25.0 - 1.25.3.
The issue is fixed in nginx 1.25.4.

id=39373804 這邊有些目前 nginx 組成的資訊可以讀,目前 nginx 的 core devs 應該就三位 (在 Insights/Contributors 這邊看起來只有兩位,這是因為 GitHub 上面的 mirror 看起來是從 Mercurial 同步過去的,而 Sergey Kandaurov 沒有 GitHub 帳號):

Worth noting that there are only two active "core" devs, Maxim Dounin (the OP) and Roman Arutyunyan. Maxim is the biggest contributor that is still active. Maxim and Roman account for basically 99% of current development.

So this is a pretty impactful fork. It's not like one of 8 core devs or something. This is 50% of the team.

Edit: Just noticed Sergey Kandaurov isn't listed on GitHub "contributors" because he doesn't have a GitHub account (my bad). So it's more like 33% of the team. Previous releases have been tagged by Maxim, but the latest (today's 1.25.4) was tagged by Sergey

現在就是單方面的說法,可以再讓子彈多飛一點時間... 看 F5 要不要回應,以及 F5 的說法 (如果要回應的話)。

IntelAMD GPU 直接跑 CUDA 程式的 ZLUDA

先前提過「在 Intel 內顯上面直接跑 CUDA 程式的 ZLUDA」,結果後來事情大翻轉,AMD 跑去贊助專案,變成支援 AMD GPU 了:「AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source」,專案在 GitHubvosen/ZLUDA 這邊,而這包支援 AMD GPU 的 commit log 則是在 1b9ba2b2333746c5e2b05a2bf24fa6ec3828dcdf 這包巨大的 commit:

Nobody expects the Red Team

Too many changes to list, but broadly:
* Remove Intel GPU support from the compiler
* Add AMD GPU support to the compiler
* Remove Intel GPU host code
* Add AMD GPU host code
* More device instructions. From 40 to 68
* More host functions. From 48 to 184
* Add proof of concept implementation of OptiX framework
* Add minimal support of cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL, NVML
* Improve ZLUDA launcher for Windows

其中的轉折以及後續的故事其實還蠻不知道怎麼說的... 作者一開始在 Intel 上班,弄一弄 Intel 覺得這沒前景,然後 AMD 接觸後贊助這個專案,到後面也覺得沒前景,於是依照後來跟 AMD 的合約,如果 AMD 覺得沒前景,可以 open source 出來:

Why is this project suddenly back after 3 years? What happened to Intel GPU support?

In 2021 I was contacted by Intel about the development od ZLUDA. I was an Intel employee at the time. While we were building a case for ZLUDA internally, I was asked for a far-reaching discretion: not to advertise the fact that Intel was evaluating ZLUDA and definitely not to make any commits to the public ZLUDA repo. After some deliberation, Intel decided that there is no business case for running CUDA applications on Intel GPUs.

Shortly thereafter I got in contact with AMD and in early 2022 I have left Intel and signed a ZLUDA development contract with AMD. Once again I was asked for a far-reaching discretion: not to advertise the fact that AMD is evaluating ZLUDA and definitely not to make any commits to the public ZLUDA repo. After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs.

One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today.

這個其實還蠻好理解的,CUDA 畢竟是 Nvidia 家的 ecosystem,除非你反超越後自己定義一堆自家專屬的功能 (像是當年 MicrosoftIE 上的玩法),不然只是幫人抬轎。

Phoronix 在 open source 前幾天先拿到軟體進行測試,而他這幾天測試的結果給了「頗不賴」的評價:

Andrzej Janik reached out and provided access to the new ZLUDA implementation for AMD ROCm to allow me to test it out and benchmark it in advance of today's planned public announcement. I've been testing it out for a few days and it's been a positive experience: CUDA-enabled software indeed running atop ROCm and without any changes. Even proprietary renderers and the like working with this "CUDA on Radeon" implementation.

另外為了避免測試時有些測試軟體會回傳到伺服器造成資訊外洩,ZLUDA 在這邊故意設定為 Graphics Device,而在這次 open source 公開後會改回正式的名稱:

In my screenshots and for the past two years of development the exposed device name for Radeon GPUs via CUDA has just been "Graphics Device" rather than the actual AMD Radeon graphics adapter with ROCm. The reason for this has been due to CUDA benchmarks auto-reporting results and other software that may have automated telemetry, to avoid leaking the fact of Radeon GPU use under CUDA, it's been set to the generic "Graphics Device" string. I'm told as part of today's open-sourcing of this ZLUDA on Radeon code that the change will be in place to expose the actual Radeon graphics card string rather than the generic "Graphics Device" concealer.

作者的測試看起來在不同的測試項目下差異頗大,但如果依照作者的計算方式,整體效能跟 OpenCL 版本差不多:

Phoronix 那邊則是做了與 Nvidia 比較的測試... 這邊拿的是同樣都有支援 Nvidia 與 AMD 家的卡的 Blender 測試,然後跑出來的結果讓人傻眼,透過 ZLUDA 轉譯出來的速度比原生支援的速度還快,這 optimization 看起來又有得討論了:(這是 BMW27 的測試,在 Classroom 的測試也發現一樣的情況)

但即使如此,CUDA over AMD GPU 應該還是不會起來,官方會儘量讓各 framework 原生支援,而大多數的開發者都是在 framework 上面開發,很少會自己從頭幹...

VirtualBox 內的 Windows 上傳速度很慢的問題

因為我電腦有兩張網卡,兩條線分別接到自己拉的 HiNet 以及社區網路 (不過出去也是 HiNet,這是另外一回事了)。

我桌機的預設 routing 是走自己拉的 HiNet,但我希望 VM 是走社區網路,所以用 bridge mode 設定到網卡上,用 DHCP 取得分享器給的 private IP。

之前一直都沒注意到,前幾天用 Line 傳照片的時候很慢 (之前就有發生了,一直忘記去追問題),花了點時間追問題的時候發現是 VM 裡面的 Windows 10 上傳很慢,這點可以從 Speedtest 的測試結果看到:

先講最後的結論,在交叉測了很多組合後,我發現遇到的問題是把網卡裡的 Large Send Offload (IPv4) (也就是 LSO) 從 Enabled 改成 Disabled

回到當時抓問題的情況,當時先用筆電與 host 測試都沒看到問題,所以看起來應該是 VM 裡面的狀況,但不確定是什麼情況,畢竟不是斷掉...

由於下載速度正常,只有上傳速度卡住,一開始想到的是跟 MTU 相關的問題,所以找了指令降到 1400 後測試,還是一樣...

後來先把 VM 的網路改成 NAT,再測試上傳速度就正常了...

接著想要換個網路卡類型看看,結果卡在找不到 driver。

本來已經想拿 tcpdump 出來追了,但想說先去看看 Windows 10 網卡設定裡面的設定,結果看到 LSO... 就先關看看 (算是以前在 FreeBSD 以及 Linux 下的經驗?)。

然後一關就正常了,交叉再開關兩次確認這個參數有影響,就肯定這個 workaround 應該是有效了...

另外在自己找完問題後,在「Virtualbox 7.0.12 slow upload speed in any Guest OS」這邊看到了類似的問題以及同樣的 workaround。

LSO 過了十幾年還是...

Debian 的 64-bit time_t 計畫

在「Debian: 64-bit time_t transition in progress」這邊看到 Debian 的 mailing list 上的討論:「64-bit time_t transition in progress」,另外官方也有整理 wiki page:「64-bit time」。

因為技術上無法表示 2038/01/19 以後的時間,確定會 breaking ABI 將 time_t 從 32-bit 變成 64-bit,而現在要想辦法搞定 32-bit 平台上面可以處理這樣的改變:

The goal of this transition is to ensure that 32-bit architectures in trixie (whether they are currently release architectures, or out of archive, etc) will be capable of handling current and future timestamps referring to times beyond 2038.

離 2038/01/17 還有約 13 年多...

雲端的流量費用

在「Cloud Egress Costs (getdeploying.com)」這邊看到的文章,原文在「Cloud Egress Costs」這邊,主要是整理了表格出來可以快速了解不同雲端的流量費用差異,裡面不是單純 VPS 比較,而是各類的服務都拿出來比,像是 storage 類的以及 CDN 類的都有放進來...

Backblaze 的頻寬費用算法頗有趣,每個月給資料量的三倍大小當作免費頻寬,沒記錯的話因為 Cloudflare 是 Backblaze 的 partner,兩邊的傳輸費用不計費,如果資料是可以公開的,可以透過這個方式接出來;如果真的得走一般的流量輸出,收費是 US$0.01/GB (所以換算後是 US$10/TB)。

三家常被擺在一起的 VPS (LinodeDigitalOceanVultr) 的頻寬也都是 US$10/TB。

以前沒注意到的是 OVH CloudScaleway 的頻寬費用居然是免費的?另外 Hetzner 雖然要收費但也很低?有機會好像該玩看看,看一下品質如何?

停止使用 Spamhaus DNSBL

剛剛看到「If you query Spamhaus Projects’ legacy DNSBLs via DigitalOcean move to the free Data Query Service」這篇,覺得愈來愈詭異了,研究了目前的情況後決定停用 Spamhaus

現在已經愈來愈少自己架設 mail server 了,不過我自己還是留了幾個 domain 跑在自己架設的 Postfix 上面,最主要是 command line 下面用 Mutt 讀信還是蠻方便的,另外一方面是確保一個信箱是不受到大企業的管制。

如果不是拿套裝軟體直接架設的話,自己架設 mail server 會有不少東西要設定:在 MTA 這端通常會使用 DNSBL 擋掉已知會發 spam 的 IP address。

DNSBL 的原理不難,就是拿 IPv4 address 組合一個 hostname,透過 DNS 查詢就會知道這個 IPv4 address 是否在清單;換句話說,就是拿 DNS protocol 當作 API,當作資料庫查詢。

舉個例子來說,假設我要查 188.235.18.134 這個位置的情況 (從「Worst /24 blocks based on total spam count」這邊翻出來的),這邊使用 SpamCop 的清單,我先把 IPv4 address 反過來變成 134.18.235.188,然後再加上 SpamCop 所指定的 bl.spamcop.net,變成 134.18.235.188.bl.spamcop.net,接下來查詢就可以查到:

134.18.235.188.bl.spamcop.net has address 127.0.0.2

如果是 168.95.1.1 的話,同樣方法組合成 1.1.95.168.bl.spamcop.net 可以看到沒有在 SpamCop 清單內:

Host 1.1.95.168.bl.spamcop.net not found: 3(NXDOMAIN)

這邊選擇用 DNS 的好處包括了 DNS resolver 及 DNS library 自然的 cache,不需要 Postfix 這類 MTA 再自己實作 cache 層,對於有大量信件 (無論是正常的或是 spam) 進來的時候也不會造成提供清單的服務大量的負載。

回頭來說 Spamhaus 的情況,他們公告要擋 DigitalOcean 的理由很奇怪,因為 DigitalOcean 架設了自己的 mirror 所以他們不知道使用的量,要使用者去 Spamhaus 上註冊申請後拿到一個自己的 your_DQS_key.zen.dq.spamhaus.net 使用。

有了 unique key 在 query,這樣就給了 Spamhaus 很清晰追蹤資料,加上 Privacy Policy 裡面的資訊:

We may have to share your personal data with the parties set out below for the purposes set out in the table in paragraph 4.
[...]
– Third parties to whom we may choose to sell, transfer, or merge parts of our business or our assets. Alternatively, we may seek to acquire other businesses or merge with them. If a change happens to our business, then the new owners may use your personal data in the same way as set out in this privacy notice.

這樣就知道他們想要做什麼了...

另外一方面,查資料的時候發現他們已經擋掉 Google Public DNS 以及 Cloudflare DNS

這是我自己架設 Unbound 的查詢:

gslin@home [~] [05:09/W4] host 2.0.0.127.zen.spamhaus.org
2.0.0.127.zen.spamhaus.org has address 127.0.0.10
2.0.0.127.zen.spamhaus.org has address 127.0.0.4
2.0.0.127.zen.spamhaus.org has address 127.0.0.2

這是 Google Public DNS (8.8.8.8):

gslin@home [~] [05:09/W4] host 2.0.0.127.zen.spamhaus.org 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

Host 2.0.0.127.zen.spamhaus.org not found: 3(NXDOMAIN)

這是 Cloudflare DNS (1.1.1.1):

gslin@home [~] [05:09/W4] host 2.0.0.127.zen.spamhaus.org 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

2.0.0.127.zen.spamhaus.org has address 127.255.255.254

在 Spamhaus 的「Frequently Asked Questions (FAQ)」這篇裡面有提到 127.255.255.254 的回應是「Query via public/open resolver」:

127.255.255.252	Any	Typing error in DNSBL name
127.255.255.254	Any	Query via public/open resolver
127.255.255.255	Any	Excessive number of queries

所以還蠻清楚這家的東西已經不能碰了...

VirtualBox 的 KVM backend 版本

看到「VirtualBox KVM Public Release (cyberus-technology.de)」這邊的討論,原文是「VirtualBox KVM public release」,專案則是在 GitHub 上的 cyberus-technology/virtualbox-kvm 這邊。

這個算是解決了 VirtualBoxLinux 上常遇到的問題:當使用 VirtualBox 時無法同時使用 KVM,像是 qemu-kvm 這樣的工具。

不過看起來是直接大改 VirtualBox,而不是補一個 extension 或是 plugin 的感覺,雖然說明現有的 guest OS 可以直接套用。

沒有 pre-compiled binary,需要自己編,而且目前的版本得用 Ubuntu 22.04 內的 GCC 11 編譯,裝了新版的 GCC 12 會有狀況:

Newer GCC versions (>= 12) might cause build issues.

另外目前的主要測試的平台還是以 Intel 為主,AMD 這邊是「會動」但沒有詳細測過:

Currently, Intel x86_64 is the only supported host platform.
AMD will most likely work too but is considered experimental at the moment.

然後在比較新的 Intel 平台上,Linux kernel 有些東西要開機參數調:

Starting with Intel Tiger Lake (11th Gen Core processors) or newer, split lock detection must be turned off in the host system. This can be achieved using the Linux kernel command line parameter split_lock_detect=off or using the split_lock_mitigate sysctl.

看到編譯參數裡面的 --disable-hardening,hmmm... 先繼續放著看看?

看 Hacker News 上面文章變化的網站:Quality News

在『蒐集 Hacker News 上被不合理條件「消失」的連結』這篇看到的網站,定時去抓 Hacker News 上新的連結與首頁上的連結後整理出來的網站:「Quality News」。

像是現在在 top 1 的「John Walker, founder of Autodesk, has died (fourmilab.ch)」這篇,對應的資訊頁就在「Hacker News Story Stats: John Walker, founder of Autodesk, has died」這邊,抓個截圖:

這樣有些文章覺得奇怪時可以交叉看一下資料,就會對 Hacker News 上面的「操作」有些「感覺」了...