穿越 Firewall 的作法

先看到「SSH over HTTPS (trofi.github.io)」這篇,原文「SSH over HTTPS」講怎麼利用 HTTPS 加上 CONNECT 指令穿過去。

作者有先介紹背景,他需要在醫院待個幾天,而醫院有免費的 WiFi 可以上網,但限制很多,基本上 TCP 的部分只有 80/tcp 與 443/tcp 會通,另外他有行動網路可以用 (但應該不是吃到飽的?),可以當作在現場直接設定 bypass 機制的工具:

I planned to spend 1-2 days in the hospital and did not plan to use the laptop.But now I am stuck here for the past two weeks and would like to tinker on small stuff remotely. The hospital has free Wi-fi access.

The caveat is that hospital blocks most connection types. It allows you only to do HTTP (TCP port 80) and HTTPS (TCP port 443) connections for most hosts. DNS (UDP port 53) and DoT (TCP port 853) seem to work as well at least for well-known DNS providers.

But SSH (TCP port 22 or most other custom ports) is blocked completely.

I wondered how hard would it be to pass SSH through HTTP or HTTPS. I had a GSM fallback so I could reconfigure remote server and try various solutions.

作者的方法就是在 TLS/SSL connection 上面跑 SSH,以前幹過,但就如同 comment 裡面提到的,Cisco 的 AnyConnect (主要是用 open-source client 的 OpenConnect 以及 open-source server 的 ocserv) 比較彈性,而且 AnyConnect 的協定會自動嘗試 UDP-based 的 DTLS,傳輸效率會比 TCP-based 的協定好,另外在 iOS 上可以直接裝 app store 裡面 Cisco 官方的 client 來用。

但從作者的其他文章看起來應該也是熟門熟路了,會這樣做應該是手上有 HTTPS 的 apache server 可以設定來用。

另外作者雖然沒寫出來,但想法應該是有 SSH 就可以在 command line 透過 -D 生出 SOCKS 服務當 proxy 讓其他程式使用,常見的應用程式大多都支援。

應該就是臨時要在醫院裡面待個一兩天時的暫時性方案,如果常態會遇到的話應該是會架 ocserv 來繞...

FreeBSD 14.0 釋出

FreeBSD 14.0-RELEASE 的公告也出來了:「FreeBSD 14.0-RELEASE Announcement」,比較完整的 release notes 在「FreeBSD 14.0-RELEASE Release Notes」。

先從官方列的 highlight 來看,首先比較重要的是 GENERIC kernel 支援 1024 cores:

FreeBSD supports up to 1024 cores on the amd64 and arm64 platforms.

看了一下 commit log 是從 256 變成 1024

先就 x86-64 這邊來看,目前「家用」最多的應該是 AMD7995WX (96 cores),舊版的 256 限制應該也還能撐住,但看 commit log 有提到,主要是預期這幾年應該會有更暴力的機器出現。

另外一塊是伺服器端,Intel 這邊有 8 sockets 的版本 (參考「Intel Xeon Sapphire Rapids to Scale to 4 and 8 Sockets」),如果都是接 8490H 的話就是 480 cores 了。

ARM 的話好像也可以堆,但不熟...

另外一個提到的重點是 TCP 預設的 congestion control 改成 CUBIC

The default congestion control mechanism for TCP is now CUBIC.

翻 commit log 可以看到是從 NewReno 換成 CUBIC 的,這樣就跟 Linux kernel 預設值一樣了。

再來比較重要的是在 release notes 裡面提到的,FreeBSD 15.0 將會拔光 32-bit 環境的支援,只留 armv7,這代表 Raspberry Pi 第一代的 armv6 也被淘汰掉了:

FreeBSD 15.0 is not expected to include support for 32-bit platforms other than armv7. The armv6, i386, and powerpc platforms are deprecated and will be removed. 64-bit systems will still be able to run older 32-bit binaries.


首先是看到 non-root 的 chroot

The chroot facility supports unprivileged operation, and the chroot(8) program has a -n option to enable its use. a40cf4175c90 (Sponsored by EPSRC)

然後把 OpenSSH 內對 FIDO/U2F 的支援開起來了:

The use of FIDO/U2F hardware authenticators has been enabled in ssh, using the new public key types ecdsa-sk and ed25519-sk, along with corresponding certificate types. FIDO/U2F support is described in https://www.openssh.com/txt/release-8.2. e9a994639b2a (Sponsored by The FreeBSD Foundation)

ASLR 預設開啟:

Address Space Layout Randomization (ASLR) is enabled for 64-bit executables by default. It can be disabled as needed if applications fail unexpectedly, for example with segmentation faults. To disable for a single invocation, use the proccontrol(1) command: proccontrol -m aslr -s disable command. To disable ASLR for all invocations of a binary, use the elfctl(1) command: elfctl -e +noaslr file. Problems should be reported via the problem reporting system, https://bugs.freebsd.org, or posting to the freebsd-stable@FreeBSD.org mailing list. b014e0f15bc7 (Sponsored by Stormshield)

然後先前被罵臭頭的 WireGuard 支援也放回來了:(「FreeBSD & pfSense 上的 WireGuard 問題」)

The kernel wg(4) WireGuard driver has been reintegrated; it provides Virtual Private Network (VPN) interfaces using the WireGuard protocol. 744bfb213144 (Sponsored by Rubicon Communications, LLC ("Netgate") and The FreeBSD Foundation)

然後看到 Netflix 贊助的 kTLS 支援 TLS 1.3:

KTLS (the kernel TLS implementation) has added receive offload support for TLS 1.3. Receive offload is now supported for TLS 1.1 through 1.3; send offload is supported for TLS 1.0 through 1.3. 05a1d0f5d7ac (Sponsored by Netflix)

然後 FreeBSD 長久以來 root 預設用的 /bin/csh 改成 /bin/sh 了:

The default shell for the root user is now sh(1), which has many new features for interactive use. d410b585b6f0

預設的 MTA 變成 dma (Dragonfly Mail Agent),看名字加上翻了一下 manpage,確認是從 Dragonfly BSD 移植過來的:

The default mail transport agent (MTA) is now the Dragonfly Mail Agent (dma(8)) rather than sendmail(8). Configuration of the MTA is done via mailer.conf(5). sendmail(8) and its configuration remain available. a67b925ff3e5

然後 portsnap 被拔掉了,現在就建議直接用 git 拉了,算是功成身退了:

The portsnap(8) utility has been removed. Users are encouraged to fetch the ports tree by using pkg install git and then git clone https://git.FreeBSD.org/ports.git /usr/ports. df53ae0fdd98

而 mergemaster 也被換成 etcupdate 了:

mergemaster(8) has been deprecated. Its replacement is etcupdate(8). 398b12691b4f (Sponsored by The FreeBSD Foundation)

然後支援 tarfs,而且可以用 zstd

The tarfs(5) file system has been added, which is backed by POSIX tar archives optionally compressed with zstd(1). 69d94f4c7608 (Sponsored by Juniper Networks, Inc.) (Sponsored by Klara, Inc.)

好久沒看 FreeBSD 的 release notes...

Uptime Kuma:UptimeRobot 的 Open Source Clone

UptimeRobot 算是我自己還蠻喜歡用的監控服務,各種常見的類型都支援,像是 HTTP/HTTPS website 的監控,ICMP Ping 或是 TCP 的監控,另外也提供 SSL certificate 的監控。

前陣子找資料的時候找到 Uptime Kuma,這是香港的 Louis Lam 開發的 open source software,光看名字就懷疑是 UptimeRobot 的致敬版本,裝起來用過後就更確定...

差不多用了一個多月了,基本上功能還蠻完整的,我把他接到我自己的 Slack 以及 Pushover 上,就很方便的可以監控服務:


musl 的 DNS reolsver 支援 TCP fallback

Facebook 上看到這篇:

剛好想起前陣子在 Hacker News 上看到「Musl 1.2.4 adds TCP DNS fallback (openwall.com)」這個消息,裡面的連結是今年五月 musl 1.2.4 的出版公告:「musl 1.2.4 released」(話說 openwall 網站似乎有擋 HiNet 的 IP,我是走第四台網路看的,或是參考 Internet Archive 上面的連結)。

musl 1.2.4 一個很重要的改變是在 DNS resolver 上支援了 TCP fallback (也就是支援 DNS over TCP),這改善了長久以來在 container 裡面使用 Alpine Linux 偶而會因為 DNS 遇到沒有照標準做的 server 而中雷的問題:

This release adds TCP fallback to the DNS stub resolver, fixing the longstanding inability to query large DNS records and incompatibility with recursive nameservers that don't give partial results in truncated UDP responses. It also makes a number of other bug fixes and improvements in DNS and related functionality, including making both the modern and legacy API results differentiate between NODATA and NxDomain conditions so that the caller can handle them differently.

查了一下對應的標準,跑去問 ChatGPT 的 GPT-4:

但 ChatGPT 引用的東西都不能直接當作是實際的文字,只能當作一個起點去找。實際翻 RFC 1035 可以翻到:

Messages carried by UDP are restricted to 512 bytes (not counting the IP or UDP headers). Longer messages are truncated and the TC bit is set in the header.

所以的確在 UDP response 的規範是 512 bytes,要取得完整的資料只能往 TCP 查詢。而 musl 有了這個 TCP fallback 總算是補掉了 Alpine Linux 的一個大坑。

而 musl 1.2.4 則是在 Alpine Linux 3.18 才開始使用:「Alpine 3.18.0 released」。

musl libc 1.2.4 – now with TCP fallback in DNS resolver

所以回到開頭,提到 Alpine Linux 3.16 還是有問題的人,是應該會遇到問題沒錯,因為 3.16 的 musl 本來就沒 TCP fallback?遇到不標準的 DNS server 的確是會噴...

Anyway,Alpine Linux 的 DNS 問題應該會變成過去式...

Netflix 單機 800Gbps 伺服器所使用的最佳化技巧

Hacker News 上看到 Netflix 的人丟出來的投影片,試著了解 Netflix 的 Open Connect Appliances 裡與 FreeBSD 相關的最佳化技巧對於效能的影響:「The “other” FreeBSD optimizations used by Netflix to serve video at 800Gb/s from a single server」。

看起來這邊的分析是先基於 400Gbps 的版本,可以跑到 375Gbps (53% CPU),接著在上面拔掉各種最佳化的設定,看看會掉多少流量。這邊可以參考先前在「Netflix 在單機服務 400Gbps 的影音流量」提到的資料。

投影片上的第一章是 sendfile 與 kTLS 相關的最佳化,這邊可以看出來都是重要的項目,隨便關掉一個就會掉很多 capacity:

  • Disable kTLS (and async sendfile) + nginx aio:40Gbps (100% CPU)
  • Disable kTLS (and async sendfile) + nginx thread pools:90Gbps (90% CPU)
  • Disable sendfile (but use kTLS):75Gbps (80% CPU)
  • Disable sendfile (but use NIC kTLS):95Gbps (80% CPU)
  • Enable Sendfile & kTLS, but disable ISA-L crypto:180Gbps (80% CPU)
  • Enable Sendfile & kTLS:240Gbps (80% CPU)

第二章是 virtual memory,UMA VM Page Cache 這邊看起來最明顯,SF_NOCACHE 也是個重要的項目:

  • Disable UMA VM Page Cache:60Gbps (95% CPU)
  • Disable VM Batch Queues:280Gbps (95% CPU)
  • Disable SF_NOCACHE:120Gbps (55% CPU)

另外第二章特別提到了一個之前沒有用到的 optimization,是把 arm64 上面的 4KB Pages 變成 16KB Pages,這帶動了些許的效能提昇,並且降低了 CPU 使用率:

345Gb/s @ 80% CPU -> 368Gb/s @ 66% CPU

第三章是 network stack,看起來 TSO 帶來的效益也是很高:

  • Disable TCP Large Receive Offload:330Gbps (65% CPU)
  • Disable RSS accelerated LRO:365Gbps (70% CPU)
  • TSO Disabled:180Gbps (85% CPU)
  • Disable TSO and LRO:170Gbps (85% CPU)

最後面則是有提到從 400Gbps 到 800Gbps 還多做了那些事情,最後是達到 731Gbps。

用的機器是 Dell PowerEdge R7525,這是一台 2U 的機器啊...

網頁大小 14KB 與 15KB 的速度差異

Hacker News 上看到「Why your website should be under 14kB in size」這篇,對應的討論在「A 14kb page can load much faster than a 15kb page (endtimes.dev)」,在講網頁大小 14KB/15KB 的速度差異比 15KB/16KB 大很多的問題:

What is surprising is that a 14kB page can load much faster than a 15kB page — maybe 612ms faster — while the difference between a 15kB and a 16kB page is trivial.

原因是 TCP slow start 造成的:

This is because of the TCP slow start algorithm.

而網頁這邊 TCP slow start 目前大多數的實做都是 10 packets 後發動:

Most web servers TCP slow start algorithm starts by sending 10 TCP packets.

然後再組合 1500 bytes/packet 以及 overhead,就差不多是 14KB 了:

The maximum size of a TCP packet is 1500 bytes.

This this maximum is not set by the TCP specification, it comes from the ethernet standard

Each TCP packet uses 40 bytes in its header — 16 bytes for IP and an additional 24 bytes for TCP

That leaves 1460 bytes per TCP packet. 10 x 1460 = 14600 bytes or roughly 14kB!

然後 HTTP/3 也可以看到類似的設計 (出自「QUIC Loss Detection and Congestion Control」:

Sending multiple packets into the network without any delay between them creates a packet burst that might cause short-term congestion and losses. Implementations MUST either use pacing or limit such bursts to the initial congestion window, which is recommended to be the minimum of 10 * max_datagram_size and max(2* max_datagram_size, 14720)), where max_datagram_size is the current maximum size of a datagram for the connection, not including UDP or IP overhead.

算是一個小知識... 但對於現在肥滋滋的網頁效果來說就沒辦法了,而且考慮到大一點的網站會在一個 TCP 連線裡面可能會傳很多 request,其實早就超過 TCP slow start 的門檻了。

TCP 標準被整理到 RFC 9293

看到「RFC 9293: Transmission Control Protocol (TCP)」這篇,主要是把本來分散在各個 RFC 的文件 (從 RFC 793 開始) 全部整理成一份,另外把一些已知的勘誤表放進來:

This document specifies the Transmission Control Protocol (TCP). TCP is an important transport-layer protocol in the Internet protocol stack, and it has continuously evolved over decades of use and growth of the Internet. Over this time, a number of changes have been made to TCP as it was specified in RFC 793, though these have only been documented in a piecemeal fashion. This document collects and brings those changes together with the protocol specification from RFC 793. This document obsoletes RFC 793, as well as RFCs 879, 2873, 6093, 6429, 6528, and 6691 that updated parts of RFC 793. It updates RFCs 1011 and 1122, and it should be considered as a replacement for the portions of those documents dealing with TCP requirements. It also updates RFC 5961 by adding a small clarification in reset handling while in the SYN-RECEIVED state. The TCP header control bits from RFC 793 have also been updated based on RFC 3168.

然後淘汰掉 (obselete) 一卡車 RFC 文件 XD

翻資料發現 2014 的時候 HTTP/1.1 被幹過一次類似的事情,不過是反過來被拆開:「HTTP/1.1 的更新」,這次把 RFC 2616 幹掉分成 RFC 7230RFC 7235

然後今年因為 HTTP 的關係又被幹了一次,這次 HTTP/1.1 又被整回來變成一份文件,但是把裡面的一些概念拆開:「HTTP 標準的翻新」。

其中 RFC 9110 定義 HTTP Semantics,RFC 9111 定義 HTTP Caching,然後 RFC 9112RFC 9113 拿來定義了 HTTP/1.1 與 HTTP/2,另外先把 HTTP/3 的號碼保留下來的 RFC 9114

不斷 refactor 以及加新功能的文件...

DOS 下的 TCP/IP 程式組

Hacker News 上看到「mTCP: TCP/IP applications for DOS PCs (brutman.com)」這個有趣的東西,在 DOS 環境下的 TCP/IP 程式組,原網站在「TCP/IP applications for your PC compatible retro-computers」這邊。

不過我覺得比較神奇的是,他的測試環境是真的包括一堆老機器跟網卡耶 XDDD

裡面提到的 NE1000ISA 界面的卡,這樣聽起來保養的都還不錯...

Tor 支援 congestion control

Tor 首度在協定內支援了 congestion control:「Congestion Control Arrives in Tor 0.4.7-stable!」。


Tor has released, the first stable Tor release with support for congestion control. Congestion control will eliminate the speed limit of current Tor, as well as reduce latency by minimizing queue lengths at relays. It will result in significant performance improvements in Tor, as well as increased utilization of our network capacity.

之所以沒有辦法直接利用 packet loss 的方式讓 TCP network stack 直接判斷 congestion control,是因為這樣會產生 side channel:

Crucially, we rejected mechanisms to provide congestion control by allowing packet drops, due to the ability to introduce end-to-end side channels in the packet drop pattern.

所以 Tor 得自己實做 congestion control 演算法,選擇的演算法是結合了 Vegas 的 Tor-Vegas,可以看到在實驗中,德國與香港的 exit node 效率大幅提昇:

另外也因為 也出來一個禮拜了,也可以看到 Advertised Bandwidth (算是 Tor network 觀察到的 bandwidth) 開始成長:

另外一個重要的點是 UDP 的支援計畫,看起來在這次改善後也比較有可行性了:

The astute reader will note that we rejected datagram transports. However, this does not mean that Tor will never carry UDP traffic. On the contrary, congestion control deployment means that queue delay and latency will be much more stable and predictable. This will enable us to carry UDP without packet drops in the network, and only drop UDP at the edges, when the congestion window becomes full. We are hopeful that this new behavior will match what existing UDP protocols expect, allowing their use over Tor.

用 iptables 擋特定國家的封包

這兩天發現 ubuntu-20.04.3-live-server-amd64.iso 這個 BitTorrent 的 ISO image 有大量來自 CN 的連線在狂抓,導致整個上傳頻寬都被吃滿:

沒想到第一次用 iptables 的 xt_geoip 居然是這個用途... 主要是參考「GeoIP Blocking Ubuntu 20.04LTS」這邊的方法,不過因為我的 rtorrent 是跑在 Docker 裡面的,有另外要注意的地方。

首先是安裝軟體,這邊要裝 xtables-addons-commonlibtext-csv-xs-perl

sudo apt install -y libtext-csv-xs-perl xtables-addons-common

再來是建立目錄,並且下載一包 GeoIP 的資料 (從 DBIP 下載) 並且轉成 xt_geoip 可以用的格式:

sudo mkdir /usr/share/xt_geoip
cd /usr/share/xt_geoip
sudo /usr/lib/xtables-addons/xt_geoip_dl
sudo /usr/bin/perl /usr/lib/xtables-addons/xt_geoip_build

然後就是加到 iptables 的條件裡面了,我加到兩個地方,一個是 INPUT chain,另外一個是 DOCKER-USER chain (參考「Docker and iptables」這邊的說明),假設你是用 port 6991 的話就這樣加:

sudo iptables -I INPUT -p tcp -m geoip --source-country CN -m tcp --dport 6991 -j DROP
sudo iptables -I DOCKER-USER -p tcp -m geoip --source-country CN -m tcp --dport 6991 -j DROP


另外在找資料的時候發現「Free updated GeoIP legacy databases」這邊有人放出 MaxMind 的版本,不過免費版的應該都差不多,這邊就用 xtables-addons-common 內預設的。