Web Audio API 當做 fingerprint 的方式

三年前的文章「How the Web Audio API is used for audio fingerprinting」講解了 AudioContext 是怎麼被拿來 fingerprint 的,最近在「How We Bypassed Safari 17's Advanced Audio Fingerprinting Protection」這篇看到的。

AudioContext 可以完全跟錄音設備無關,單純計算,然後因為不同瀏覽器實作上面有差異,就被拿來當作 fingerprint 了。

文章裡介紹的方法是透過 Oscillator 產生 440Hz 的正弦波,然後過 Compressor 降低音量 (運算):

The Web Audio API provides a DynamicsCompressorNode, which lowers the volume of the loudest parts of the signal and helps prevent distortion or clipping.

降低音量的運算再這塊各家的實作不同,就能夠區分不同的瀏覽器 (甚至是版本):

Historically, all major browser engines (Blink, WebKit, and Gecko) based their Web Audio API implementations on code originally developed by Google in 2011 and 2012 for the WebKit project.

Since then browser developers have made a lot of small changes. These changes, compounded by the large number of mathematical operations involved, lead to fingerprinting differences. Audio signal processing uses floating point arithmetic, which also contributes to discrepancies in calculations.

Additionally, browsers use different implementations for different CPU architectures and OSes to leverage features like SIMD. For example, Chrome uses a separate fast Fourier transform implementation on macOS (producing a different oscillator signal) and other vector operation implementations on different CPU architectures (used in the DynamicsCompressor implementation). These platform-specific changes also contribute to differences in the final audio fingerprint.

而這東西平常也不會用到,所以對 Tor Browser 這種特別重視 privacy 的瀏覽器就直接關掉他了:

Tor

In the case of the Tor browser, everything is simple. But unfortunately, web Audio API is disabled there, so audio fingerprinting is impossible.

用 Brave 的 brave://flags 設定

因為 Brave 會繼續支援 webRequest 的完整功能,所以早早就已經跳槽過來,不過 Brave 的垃圾功能太多 (一堆 cryptocurrency 相關的功能),有需要全部關掉,所以裝好後第一件事情就是到 brave://flags 裡面把所有 Brave 自家功能的設定關光光,只留下 Brave Sync V2 的功能。

先前都是用滑鼠一個一個關,剛剛覺得太累了,研究一下 js 直接在 dev console 裡面關:

document.querySelector('flags-app').shadowRoot.querySelectorAll('flags-experiment[id^=brave]').forEach(flag => {
  flag.shadowRoot.querySelectorAll('option').forEach(o => {
    if (o.innerText === 'Disabled') {
      o.selected = 'selected';
      o.parentElement.dispatchEvent(new Event('change'));
    }
  });
});

這樣會全部把 brave 開頭的 feature flag 都關掉;最後再手動把 Sync V2 開回來就可以重開 Brave 了。

後續就是 GUI 上面的選單再調整成自己要的。

這邊搞 javascript 的部分有遇到四個地方要處理... 第一個是現在 dev console 預設不讓你貼東西進去,貼了以後會顯示要輸入 allow pasting 後才行,這個字串以及方法之後不知道會不會改。

第二個是遇到 querySelectorAll 穿不過 shadow DOM,所以可以看到程式碼裡面需要先選出 shadow DOM 元素,用 shadowRoot 鑽進去再 querySelectorAll 一次。

第三個是 css selector 沒辦法針對 innerHTML 或是 innerText 的內容判斷,所以得自己拉出 option 後對 innerText 判斷。

最後一個是改變 select 的值後,得自己觸發 change event,這樣才會讓 Brave 偵測到並且儲存。

各家首頁 JavaScript 的大小

看到「JavaScript Bloat in 2024 (tonsky.me)」這篇在講各家首頁 JavaScript 的大小,原文在「JavaScript Bloat in 2024」這邊。

作者 Nikita Prokopov 創造了很多 open source project,我比較有印象的是 Fira Code,這篇比較像是他在抱怨現在的網站...

裡面站台一堆都是 10MB+ 的 JavaScript 在跑的,突然提到 P 站只有 1.4MB 的時候笑了出來 (在 Hacker News 的 comment 裡面也有人提到這個):

Compare it to people who really care about performance — Pornhub, 1.4 MB:

另外 Jira 也被拿出來鞭:

Here, Jira, a task management software. Almost 50 MB!

但趨勢看起來不可逆?現在開發網站一堆都搞前後端分離,用 JavaScript 產生出所有頁面,然後再想辦法補 SEO...

nginx 分家:freenginx

Hacker News 上看到 Maxim Dounin 決定分家到 freenginx 的消息:「Freenginx: Core Nginx developer announces fork (nginx.org)」,原文在 mailing list 上:「announcing freenginx.org」,這邊提到分家的原因:

Unfortunately, some new non-technical management at F5 recently decided that they know better how to run open source projects. In particular, they decided to interfere with security policy nginx uses for years, ignoring both the policy and developers’ position.

在 freenginx 的 mailing list 上有提到更多,在 2024-February/000007.html 這篇:

The most recent "security advisory" was released despite the fact that the particular bug in the experimental HTTP/3 code is expected to be fixed as a normal bug as per the existing security policy, and all the developers, including me, agree on this.

And, while the particular action isn't exactly very bad, the approach in general is quite problematic.

這邊提到的 security advisory 是「[nginx-announce] nginx security advisory (CVE-2024-24989, CVE-2024-24990)」這個,看起來是個沒有 enabled by default 的功能:

Two security issues were identified in nginx HTTP/3 implementation,
which might allow an attacker that uses a specially crafted QUIC session
to cause a worker process crash (CVE-2024-24989, CVE-2024-24990) or
might have potential other impact (CVE-2024-24990).

The issues affect nginx compiled with the ngx_http_v3_module (not
compiled by default) if the "quic" option of the "listen" directive
is used in a configuration file.

The issue affects nginx 1.25.0 - 1.25.3.
The issue is fixed in nginx 1.25.4.

id=39373804 這邊有些目前 nginx 組成的資訊可以讀,目前 nginx 的 core devs 應該就三位 (在 Insights/Contributors 這邊看起來只有兩位,這是因為 GitHub 上面的 mirror 看起來是從 Mercurial 同步過去的,而 Sergey Kandaurov 沒有 GitHub 帳號):

Worth noting that there are only two active "core" devs, Maxim Dounin (the OP) and Roman Arutyunyan. Maxim is the biggest contributor that is still active. Maxim and Roman account for basically 99% of current development.

So this is a pretty impactful fork. It's not like one of 8 core devs or something. This is 50% of the team.

Edit: Just noticed Sergey Kandaurov isn't listed on GitHub "contributors" because he doesn't have a GitHub account (my bad). So it's more like 33% of the team. Previous releases have been tagged by Maxim, but the latest (today's 1.25.4) was tagged by Sergey

現在就是單方面的說法,可以再讓子彈多飛一點時間... 看 F5 要不要回應,以及 F5 的說法 (如果要回應的話)。

SQLite 官方提供的網頁版 playground

Hacker News 上看到「Sqlite3 Utility in the Browser (sqlite.org)」這個,看了一下是官方提供的 playground:「SQLite3 Fiddle」。

https://sqlite.org/fiddle/fiddle.js 這邊可以看到 2022 年就有的東西,在 Internet Archive 上也可以看到也是差不多時間被記錄下來的:「Saved 21 times between August 12, 2022 and January 24, 2024.」。

看起來是用 WebAssembly 包起來的,不過如果是自己的機器,本機跑 sqlite3 好像會方便一些...

用 x86 組語寫的 web forum

另外一個在週末看到的奇怪東西,用 x86 組語寫的 web forum:「AsmBB – a lightweight web forum engine written in assembly language (asmbb.org)」,官網在 asmBB 這邊,說明則是在「What is AsmBB?」這頁。

是個 x86 assembly + SQLite 的組合:

AsmBB is fully written in assembly language and uses SQLite as a database back-end. That is why it can work on really weak hosting and in the same time serve huge amount of visitors without lags and delays.

程式碼是用 Fossil 管的,不過作者有 mirror 到 GitHub 上:「johnfound/asmbb」。

SQlite 的部分用了 SQLeet,可以在 sqleet.inc 這個檔案裡面看到引用了 libsqleet.so

AsmBB uses SQLite with SQLeet extension as a back-end storage database and FastCGI interface to the web server.

組語的部分看起來是用 fasm 寫的,看程式碼 (像是 encryption.asm) 可以看到一些好用的語法,像是自動處理 function 參數的部分 (應該有設定某種 convention),跟當年寫 MASM 的用法方便不少:

stdcall TextCreate, sizeof.TText
mov     edx, eax

stdcall TextCat, edx, <"Content-type: text/html; charset=utf-8", 13, 10, 13, 10>
stdcall RenderTemplate, edx, 'crypto.tpl', 0, esi

Google Chrome 將在 115 版之後預設使用 HTTPS 連線

Google Chrome (Chromium) 宣布 115 版後將預設使用 HTTPS 連線:「Towards HTTPS by default」。

查了一下 115.0.5790.98 是 2023/07/18 就出的版本,現在才冒出這篇文章有點晚,但大概就是講一下幹了什麼事情?

We're currently experimenting with this change in Chrome version 115, working to standardize the behavior across the web, and plan to roll out the feature to everyone soon.

主要的差異是在於,即使你輸入或是點擊的連結是 http://,他還是會優先嘗試 HTTPS:

Chrome will automatically upgrade all http:// navigations to https://, even when you click on a link that explicitly declares http://.

只有在 http:// 連結遇到 upgrade 到 HTTPS 失敗時才會回頭用 HTTP:

This works very similarly to HSTS upgrading, but Chrome will detect when these upgrades fail (e.g. due to a site providing an invalid certificate or returning a HTTP 404), and will automatically fallback to http://.

而本來就用 https:// 的連線就完全不會碰 HTTP 了。

講到推動 HTTPS 這點,前陣子剛好也是 Snowden 揭露美國 PRISM (菱鏡計畫) 十年的日子,當年在揭露後也因此加速了各種加密技術的基礎建設,像是 Let's Encrypt,而這也使得 HTTPS 更加普及,也讓 Google Chrome 現在可以預設切 HTTPS。

OpenAI 的 web crawler 叫做 GPTBot

Hacker News 上看到「GPTBot – OpenAI’s Web Crawler (openai.com)」,原文是 GPTBot 這個,提到了 OpenAI 的 web crawler,User-Agent 會長這樣:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

然後有提到他會遵守 robots.txt

另外提供了 web crawler 會使用的 IP range,放在 gptbot-ranges.txt 這邊,目前裡面看起來只有 40.83.2.64/28,是 Azure 的網段。

這個行為有點微妙了,要開始自己掃資料嗎?

看起來 Ladybird 瀏覽器會被加速開發?

Ladybird 是一個從零寫起的瀏覽器,可以參考作者在「Ladybird: A new cross-platform browser project」這邊的介紹。

GitHub 上的頁面可以看到「This repository has been archived by the owner on Dec 25, 2022. It is now read-only.」,目前的專案是放在 SerenityOS 裡面:「The Ladybird web browser has moved into the SerenityOS monorepo.」。

結果作者提到收到了 US$100,000 的 sponsor,指名 Ladybird browser:

而他發出來不久以後,又有另外一筆捐款,也是 US$100,000 但是是透過虛擬貨幣,所以他要找會計師:

是個強制加速開發的方式... 現在的瀏覽器幾乎就是 Blink + WebKit 統一天下,用自家 GeckoFirefox 整天在搞事,好像是有機會看到新的 engine 與瀏覽器挑戰了...

nginx 開始嘗試支援 HTTP/3

Hacker News 上看到「Nginx 1.25.0: experimental HTTP/3 support (nginx.org)」這則消息,從 nginx 1.25.0 開始可以用 HTTP/3

HTTP/2 最大的差異就是從以往的 TCP 改到 UDP 上了,這是基於 QUIC 的經驗弄出來的東西...

nginx 的支援算是等了一陣子了,不過沒有當初 HTTP/1.1SPDY 的進步這麼明顯,我自己就沒有跟的那麼緊了。

這樣以後 office firewall 預設應該會再開 443/udp?