Google SRE 團隊整理出過去二十年的十一條心得

Google 的 SRE 團隊整理出過去二十年的心得,當看故事的心態在看的:「Lessons Learned from Twenty Years of Site Reliability Engineering」,在 Hacker News 上也有討論:「Lessons Learned from Twenty Years of Site Reliability Engineering (sre.google)」。

裡面的項目大多都會在公司成長時不斷的導入,都是夠大就會遇到的。

比較有趣的是第六條,這是唯一一條全部都用大寫字母列出來的:

COMMUNICATION CHANNELS! AND BACKUP CHANNELS!! AND BACKUPS FOR THOSE BACKUP CHANNELS!!!

到 Google 這個規模的架構,這邊就會規劃找完全獨立於 Google 架構的方案來用;我猜應該是傳統的 colocation 機房 (像是 AT&T 之類的),上面跑 IRC server 之類的?

在 Hacker News 上面也有其他人提到 Netflix 也有類似的規劃,需要有一個備援的管道是完全獨立於 AWS 的;另外同一則 comment 裡也有提到 Reddit 的作法是在辦公室裡面放 IRC server 備援:

Yes! At Netflix, when we picked vendors for systems that we used during an outage, we always had to make sure they were not on AWS. At reddit we had a server in the office with a backup IRC server in case the main one we used was unavailable.

IRC 還是很好用的 XD

OpenSSH 加入了 noise (keystroke timing obfuscation) 功能

Hacker News 上看到在 OpenSSH 裡加入 keystroke timing obfuscation 的功能:「Keystroke timing obfuscation added to ssh(1) (undeadly.org)」。

如同 commit log 裡面提到的,這個功能會想要故意沒事就送一些沒用的資料 (增加一些噪音),降低從 side channel 被判讀的資訊量:

This attempts to hide inter-keystroke timings by sending interactive traffic at fixed intervals (default: every 20ms) when there is only a small amount of data being sent. It also sends fake "chaff" keystrokes for a random interval after the last real keystroke. These are controlled by a new ssh_config ObscureKeystrokeTiming keyword/

基於 OpenSSH 算是 SSH 這塊的 de-factor standard 了,接下來看其他家像是 Dropbear 會不會也實作?

透過 mDNS 建立內部網路的 fingerprint

Hacker News 上看到透過 mDNS 建立 fingerprint 的方式,進而定位使用者身分:「Brute-forcing a macOS user’s real name from a browser using mDNS (fingerprint.com)」,原文在「Demo: Brute-forcing a macOS user’s real name from a browser using mDNS」。

利用發 HTTP(s) request 出去時,雖然都是傳回 Failed to fetch 錯誤,但因為 hostname 存在時會是 connection timeout,而不存在時會直接因為 DNS 查不到而很快 failed 掉,這個時間差異產生了 side channel,可以透過時間差異知道某個 hostname 是否存在。

這個技巧配合字典就可以大量掃描 *.local 的 mDNS 網段,進而產生出內部網路的 fingerprint。

這個問題應該是有標準解法 (或是有被提案過的解法),就是不讓 internet domain 存取 local domain 的東西,像是避免 internet 上的網站透過 JavaScript 碰到 http://127.0.0.1:xxx/ 的機制。

應該是把 *.local 用同樣方式對待就能避開這個問題?

AWS Support 整合到 Slack 頻道內

在「New – AWS Support App in Slack to Manage Support Cases」這邊看到的新功能,主要是下面這張圖,看起來可以直接在 Slack 上面開 private channel 跟 AWS Support 成員討論事情:

依照說明,有 Business 以上的等級都可以用:

The AWS Support App in Slack is now available to all customers with Business, Enterprise On-ramp, or Enterprise Support at no additional charge.

雖然好像不是哪麼常跟 AWS 的人打交道,但平常先掛起來好像不錯...

Cloudflare 上的 Hertzbleed 解釋

除了 Hertzbleed 當初公佈時的論文與網頁外,Cloudflare 上也有一篇 Hertzbleed 的解釋:「Hertzbleed explained」。

會特別拿出來提是因為這篇是 Yingchen Wang 寫的,也就是 Hertzbleed 論文裡兩位第一作者之一 (另外一位是 Riccardo Paccagnella),而從她的網站上也可以看到 Cloudflare intern 的資訊:

Graduate Research Intern at Cloudflare, 2022 Summer

Hertzbleed 也是一種 side-channel attack,利用 CPU 會依照電量與溫度,而動態調整頻率的特性來達到遠端攻擊,而不需要在機器旁邊有功率錶之類儀器。

傳統上針對這類執行時間的程式會用 constant-time programming 來保護,但 Hertzbleed 則是利用了 CPU 會動態調整頻率的特性鑽出一個洞。現在學界對這個攻擊方式還不熟悉,等熟悉了以後應該是會把洞鑽大...

依照原理來說,定頻應該會是一個解法... 像是大家現在都很喜歡搞「降壓超頻」,算是某種定頻的方式,而一般大家會設定在全速跑也不會過熱降頻的情況。

目前 IntelAMD 都決定不 patch,依照洞一向都是愈挖愈大,來期待洞大到 RSA 或是 ECC 被打的那天...

Libmill:在 C 裡面仿造 Go 的 concurrency 架構

Hacker News 首頁上看到的專案:「Libmill is a library that introduces Go-style concurrency to C.」。

使用上的設計可以看到就是用 Golang 裡面的設計,另外在網頁下方也有提到「libdill: Structured Concurrency for C」,就不是用 Golang 的設計,但是有同樣的功能性...

兩者都是 MIT/X11 license,大多數的專案用起來應該沒什麼問題,底層應該都是用 select() 或是 poll() 來實做就是了?

freenode 又在提醒大家搬到 Libera.Chat 了...

看到「Last remaining >1000 user community channel seized by freenode staff」這篇,freenode 把 ##linux 頻道給綁走了:

Late in the evening of June 12, 2021, freenode staff seized the largest community channel left on freenode.

新家在 Libera.Chat 上面:

Similar hostile takeovers happened to the Python community, GNU and the FSF. More than 700 FLOSS projects have abandoned freenode and now the Linux.Chat community has left as well; find our new home at #linux on Libera and our multi-platform community website here.

翻了一下「IRC Networks - Top 10 in the annual comparison」這邊的資料,看起來這個月月底有機會交叉?

另外文章裡有提到 OFTC (The Open and Free Technology Community),看起來也是個對 open source community 開放的資源...

跨瀏覽器追蹤的方式

看到「Exploiting custom protocol handlers for cross-browser tracking in Tor, Safari, Chrome and Firefox」這個方式,跨瀏覽器收集 fingerprint 追蹤。

這次用的方式是透過 handler 追:

The scheme flooding vulnerability allows an attacker to determine which applications you have installed. In order to generate a 32-bit cross-browser device identifier, a website can test a list of 32 popular applications and check if each is installed or not. On average, the identification process takes a few seconds and works across desktop Windows, Mac and Linux operating systems.

最近大家比較常使用到的應該就是 Zoom 從網頁把應用程式帶起來的方式:

而要怎麼偵測的部份,用到了不同瀏覽器的 side channel。

Chromium 系列的部份對應的 ticket 在「Issue 1096610: External Protocol handler anti-flood protection is ineffective and flaky」這邊有被提出來。主要用到的方法是,在遇到有 handler 時,連打兩次時會被擋下:

被擋下後再打都會失敗,所以需要一個方式重設 flag,而內建的 Chrome PDF Viewer 剛好可以重設 flag:

The built-in Chrome PDF Viewer is an extension, so every time your browser opens a PDF file it resets the scheme flood protection flag. Opening a PDF file before opening a custom URL makes the exploit functional.

Firefox 的 side channel 則是可以透過 same-origin policy 測試當作 side channel,對應的 ticket 在「Scheme flooding technique for reliable cross-browser fingerprinting」這邊:

Every time you navigate to an unknown URL scheme, Firefox will show you an internal page with an error. This internal page has a different origin than any other website, so it is impossible to access it because of the Same-origin policy limitation. On the other hand, a known custom URL scheme will be opened as about:blank, whose origin will be accessible from the current website.

Safari 上的問題與 Firefox 一樣,不過沒登入看不到 ticket (也懶的註冊了):

You are not authorized to access bug #225769. To see this bug, you must first log in to an account with the appropriate permissions.

另外,雖然 Tor Browser 底層是 Firefox,但因為有改變預設值,所以攻擊者也得換方法:

Tor Browser is based on the Firefox source code, so the Same-origin policy trick was used here as well. But because Tor Browser does not show pop-ups, we used the same-origin policy trick with iframe elements instead.

這個方法還蠻暴力的...

Google 釋出網頁版的 Spectre 攻擊 PoC,包括 Apple M1 在內

在大約三年前 (2018 年年初) 的時候,在讀完 Spectre 之後寫下了一些記錄:「讀書時間:Spectre 的攻擊方式」,結果在 Bruce Schneier 這邊看到消息,Google 前幾天把把 PoC 放出來了:「Exploiting Spectre Over the Internet」,在 Hacker News 上也有討論:「A Spectre proof-of-concept for a Spectre-proof web (googleblog.com)」。

首先是這個攻擊方法在目前的瀏覽器都還有用,而且包括 Apple M1 上都可以跑:

The demonstration website can leak data at a speed of 1kB/s when running on Chrome 88 on an Intel Skylake CPU. Note that the code will likely require minor modifications to apply to other CPUs or browser versions; however, in our tests the attack was successful on several other processors, including the Apple M1 ARM CPU, without any major changes.

即使目前的瀏覽器都已經把 performance.now() 改為 1ms 的精度,也還是可以達到 60 bytes/sec 的速度:

While experimenting, we also developed other PoCs with different properties. Some examples include:

  • A PoC which can leak 8kB/s of data at a cost of reduced stability using performance.now() as a timer with 5μs precision.
  • A PoC which leaks data at 60B/s using timers with a precision of 1ms or worse.

比較苦的消息是 Google 已經確認在軟體層沒辦法解乾淨,目前在瀏覽器上只能靠各種 isolation 降低風險,像是將不同站台跑在不同的 process 裡面:

In 2019, the team responsible for V8, Chrome’s JavaScript engine, published a blog post and whitepaper concluding that such attacks can’t be reliably mitigated at the software level. Instead, robust solutions to these issues require security boundaries in applications such as web browsers to be aligned with low-level primitives, for example process-based isolation.

Apple M1 也中這件事情讓人比較意外一點,看起來是當初開發的時候沒評估?目前傳言的 M1x 與 M2 不知道會怎樣...

Visa 網站上面的 Opt-Out 功能被拿來玩 Timing Attack...

Hacker News Daily 上看到「Visa Advertising Solutions (VAS) Opt Out (visa.com)」這篇講 Visa 的 Visa Advertising Solutions (VAS) Opt Out,本來以為是在討論企業賣資料的問題 (下面的討論的確是有在討論這個),但最上面的討論居然是在討論 timing attack,像是這篇:

morpheuskafka 2 days ago [–]

Checked and the Mastercard one someone posted below doesn't seem to be vulnerable to this. My real card number and a dummy mastercard number with valid prefix and check digit both returned a 200 OK in around 1.01s. A random 16digit number without valid check digit returned 400 Bad Request in about 800ms. Decided to check that one since they have a completely useless machine-readable catchpa.

For Visa it was 835ms for valid, 762ms for dummy, prefix and check digit appears to be checked client side.

我印象中這類方式已經發展很久了 (透過網路反應時間的 timing attack),討論裡面有提到「Exploiting remote timing attacks」這篇,也是十多年前的資料了... 不過官方網站玩起來總是有中特別爽的感覺 XDDD

不過 Visa 的這個網站前面用了 Cloudflare,用機器人掃感覺很容易被擋,這又是另外一回事了...