透過 /etc/hosts 擋廣告與追蹤的軟體

Hacker News Daily 上看到 Maza ad blocking,這是一個擋廣告與追蹤的軟體,原理就是在 DNS 上檔掉某些網域。

運作方式跟 Pi-hole 接近,其中 Pi-hole 是提供一個 DNS server 擋,這套軟體則是透過 /etc/hosts 來擋。

目前只支援 macOSLinux,不過這樣看起來使用的族群有點怪,因為在 desktop 上有更多手段可以擋,透過 DNS 類的擋法主要還是拿來對手機上無法無天的 app...

不過先關注一下好了,之後也許會在某些場合下用到?

WireGuard 1.0.0 的釋出

在「[ANNOUNCE] WireGuard 1.0.0 for Linux 5.6 Released」這邊看到的消息,看起來 WireGuard 1.0.0 會回過投來 backport 到幾個重要的版本:

We'll also continue to maintain our wireguard-linux-compat [2] backports repo for older kernels. On the backports front, WireGuard was backported to Ubuntu 20.04 (via wireguard-linux-compat) [4] and Debian Buster (via a real backport to 5.5.y) [5]. I'm also maintaining real backports, not via the compat layer, to 5.4.y [6] and 5.5.y [7], and we'll see where those wind up; 5.4.y is an LTS release.

包括 DebianUbuntu 的新版,以及 5.4.x 的 LTS 版本,讓使用起來更方便一些...

Brave 出手檢舉 Google 沒有遵守 GDPR

Brave (從 Chromium 分支出來的瀏覽器) 檢舉 Google 沒有遵守 GDPR 的規定:「Formal GDPR complaint against Google’s internal data free-for-all」。

主要是「purpose limitation」這個部份,出自「REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016」:

1. Personal data shall be:

(b)

collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);

比較重要的是 specified 與 explicit 這兩個詞,GDPR 規定必須明確指明用途,而可以從整理出來的文件「Inside the black box」裡的「Purported processing purpose」看到大量的極為廣泛的說明。

Google 應該會就這塊反擊認為這樣的描述就夠用,就看歐盟決定要怎麼做了...

把 TLS 1.0 關掉...

突然想到所以到 SSL Report 上測試 blog.gslin.org,發現如果還支援 TLS 1.0,在 Overall Rating 的部份會直接被降為 B。

翻了一下 access log (我在 log 裡有多記錄連線的 TLS Protocol),看起來用 TLS 1.0 連的主要都是 bot,關掉應該是還好...

另外看了一下報告裡的 Cipher Suites 部份,發現不少 cipher (像是 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (0xc028)) 被列為 weak,看起來像是把使用 CBC 的 cipher 都認為是 weak,是個推廣 AEAD 的概念。

影響應該不大,但還是記錄一下... 另外等 Ubuntu 20.04 出了以後把整台重灌好了,目前還在用 Ubuntu 16.04,系統內建的 nginx 不支援 TLS 1.3。

Firefox 在美國將預設開啟 DNS over HTTPS

看到 Mozilla 在「Firefox continues push to bring DNS over HTTPS by default for US users」這邊的公告,另外也可以參考 Hacker News 上的討論:「Mozilla’s DNS over HTTPs (blog.mozilla.org)」。

這次的改變是將美國的 Firefox 使用者自動啟用 DNS over HTTPS (DoH),而預設是丟給 Cloudflare

By default, this change will send your encrypted DNS requests to Cloudflare.

這個作法非常粗暴而且侵犯使用者的隱私。

  • 對於進階而且有在跟重大消息的使用者,他們如果不信任 Cloudflare 的話,會主動關掉 DoH 的選項。
  • 但對於一般使用者,他們不知道這件事情,而他們本來也不會預期他們上網的 hostname 部份會被 Cloudflare 知道。

相較於 Google Chrome 是確認你現在用的 DNS 是不是在有支援 DoH 的清單內,如果是的話就會切過去使用 DoH,但不會因此改變 DNS provider,也就是不會有突然冒出來的第三者知道你瀏覽的網站。

來繼續看...

Google 用 x-client-data 追蹤使用者的問題

前陣子 Chromium 團隊在研究要移除 User-Agent 字串的事情 (參考「User-Agent 的淘汰提案」),結果 kiwibrowser 就直接炸下去,Google 很久前就會針對自家網站送出 x-client-data 這個 HTTP header,裡面足以辨識使用者瀏覽器的單一性:「Partial freezing of the User-Agent string#467」。

Google 的白皮書裡面是說用在 server 的試驗:

We want to build features that users want, so a subset of users may get a sneak peek at new functionality being tested before it’s launched to the world at large. A list of field trials that are currently active on your installation of Chrome will be included in all requests sent to Google. This Chrome-Variations header (X-Client-Data) will not contain any personally identifiable information, and will only describe the state of the installation of Chrome itself, including active variations, as well as server-side experiments that may affect the installation.

The variations active for a given installation are determined by a seed number which is randomly selected on first run. If usage statistics and crash reports are disabled, this number is chosen between 0 and 7999 (13 bits of entropy). If you would like to reset your variations seed, run Chrome with the command line flag “--reset-variation-state”. Experiments may be further limited by country (determined by your IP address), operating system, Chrome version and other parameters.

但因為這個預設值開啟的關係,就算關掉後也足以把使用者再分類到另外一個區塊,仍然具有高度辨識性,不是你 Google 說無法辨識就算數。

另外如果看 source code 裡的說明:

    // Note the criteria for attaching client experiment headers:
    // 1. We only transmit to Google owned domains which can evaluate
    // experiments.
    //    1a. These include hosts which have a standard postfix such as:
    //         *.doubleclick.net or *.googlesyndication.com or
    //         exactly www.googleadservices.com or
    //         international TLD domains *.google. or *.youtube..
    // 2. Only transmit for non-Incognito profiles.
    // 3. For the X-Client-Data header, only include non-empty variation IDs.

可以看到 *.doubleclick.net*.googlesyndication.comwww.googleadservices.com 全部都是廣告相關,另外 Google 自家搜尋引擎是直接提供廣告 (不透過前面提到的網域),YouTube 也是一樣的情況,所以完全可以猜測 x-client-data 這個資料就是用在廣告相關的系統上。

The Register 在「Is Chrome really secretly stalking you across Google sites using per-install ID numbers? We reveal the truth」這邊用粗體的 Update 提到了 GDPR 的問題,不確定是不是開始有單位在調查了:

Updated Google is potentially facing a massive privacy and GDPR row over Chrome sending per-installation ID numbers to the mothership.

在這個問題沒修正之前,只能暫時用操作 HTTP header 的 extension 移掉這個欄位。

Avast 與 Jumpshot 販賣使用者瀏覽記錄與行為

過了一陣子了,可以整理一下資料記錄起來...

報導可以看 PCMag 的「The Cost of Avast's Free Antivirus: Companies Can Spy on Your Clicks」與 Motherboard (VICE) 的「Leaked Documents Expose the Secretive Market for Your Web Browsing Data」這兩篇,大綱先把重點列出來了,Avast 在賣使用者的瀏覽記錄與行為:

Avast is harvesting users' browser histories on the pretext that the data has been 'de-identified,' thus protecting your privacy. But the data, which is being sold to third parties, can be linked back to people's real identities, exposing every click and search they've made.

Avast 利用免費的防毒軟體,蒐集使用者的瀏覽記錄與行為,然後透過 Jumpshot 這家子公司販賣出去:

The Avast division charged with selling the data is Jumpshot, a company subsidiary that's been offering access to user traffic from 100 million devices, including PCs and phones.

算是「免費的最貴」的標準型。另外比較有趣的是「資料賣給了誰」這件事情:

Who else might have access to Jumpshot's data remains unclear. The company's website says it's worked with other brands, including IBM, Microsoft, and Google. However, Microsoft said it has no current relationship with Jumpshot. IBM, on the other hand, has "no record" of being a client of either Avast or Jumpshot. Google did not respond to a request for comment.

Microsoft 說「現在沒有關係」,IBM 說「沒有 client 的記錄」,Google 則是不回應。

然後配合解釋資料長什麼樣子,以及可以怎麼用:

For instance, a single click can theoretically look like this:

Device ID: abc123x Date: 2019/12/01 Hour Minute Second: 12:03:05 Domain: Amazon.com Product: Apple iPad Pro 10.5 - 2017 Model - 256GB, Rose Gold Behavior: Add to Cart

At first glance, the click looks harmless. You can't pin it to an exact user. That is, unless you're Amazon.com, which could easily figure out which Amazon user bought an iPad Pro at 12:03:05 on Dec. 1, 2019. Suddenly, device ID: 123abcx is a known user. And whatever else Jumpshot has on 123abcx's activity—from other e-commerce purchases to Google searches—is no longer anonymous.

所以,如果 Google 手上有這個資料,就可以交叉比對自家的記錄,然後得到使用者完整的記錄。

在消息一公開後沒多久後,Avast 就宣佈關閉 Jumpshot,感覺連被抓包後的反應動作都超流暢,一臉就是排練過:「A message from Avast CEO Ondrej Vlcek」。

看了一下,Avast 旗下還有 AVG,還有個 VPN 服務...

User-Agent 的淘汰提案

看到廢除更新 User-Agent 字串的提案:「Intent to Deprecate and Freeze: The User-Agent string」。

一方面是 User-Agent 裡面太多沒用的假資料 (像是每一家都是 Mozilla),另外 User-Agent 會帶出一些隱私問題 (辨識資訊)。

目前的提案是希望改用 User-Agent Client Hints (UA-CH) 取代 User-Agent 的功能,把預定義好的東西透過對應的 HTTP header 傳遞。

Chromium 的計畫是在 M81 (stable 版預定在 2020 年三月中釋出) 先 deprecate navigator.userAgent,所以有存取時 web console 上會出現警告。而 M83 (2020 年六月初) 則是不再更動 user agent 字串 (鎖住)。到了 M85 (2020 年九月中) 則是統一 desktop 的 user agent 字串,並且盡可能統一 mobile 上的字串。

另外裡面也有整理了其他瀏覽器的意願:

Edge: Public support

Firefox: Public support for freezing the UA string - “freezing the User Agent string without any client hints—seems worth-prototyping”

Safari: Shipped to some extent. Safari has attempted to completely freeze the UA string in the past, without providing an alternative mechanism. That got a lot of pushback, which resulted in somewhat reverting that decision. Nowadays, their UA string seems frozen, other than updates to the OS version and the browser major version.

雖然不是完全都同意,但看起來應該有機會在今年陸陸續續搞定...

Webkit 的「反追蹤反追蹤」功能...

第一次看到標題的時候的確是 WTF 的感覺,愈來愈感覺到大戰的開始:「Preventing Tracking Prevention Tracking」。

在蘋果的平台上有 Intelligent Tracking Prevention (ITP) 功能,但先前這個功能比較簡單,所以還是有很多地方可以被當作 browser fingerprint 的一部份分析,所以蘋果決定改善,然後在新版的軟體裡引入:

This blog post covers enhancements to Intelligent Tracking Prevention (ITP) included in Safari on iOS and iPadOS 13.3, Safari 13.0.4 on macOS Catalina, Mojave, and High Sierra.

包括了跨站台時 Referer 的省略:

ITP now downgrades all cross-site request referrer headers to just the page’s origin. Previously, this was only done for cross-site requests to classified domains.

然後後面三個改善都跟 3rd-party cookie 有關,其中預設擋掉帶 cookie 的 3rd-party requests 應該會讓一些網站掛掉:

ITP will now block all third-party requests from seeing their cookies, regardless of the classification status of the third-party domain, unless the first-party website has already received user interaction.

早期自己做自家 SSO 的奇技淫巧中,會設計出透過 ajax 打多個不同的網域自動登入,看起來應該會需要檢查了...

Google 與 Cloudflare 測試 Post-Quantum 演算法的成果

這幾年量子電腦的進展不斷有突破,雖然到對於攻擊現有的密碼學看起來還有一段時間,但總是得先開始研究對量子電腦有抵抗性的演算法...

其中 Google Chrome 的團隊與 Cloudflare 的團隊手上都有夠大的產品,兩個團隊合作測試的結果在學界與業界都還蠻重視的:「Real-world measurements of structured-lattices and supersingular isogenies in TLS」、「The TLS Post-Quantum Experiment」。

Google Chrome 這邊是使用了 Canary 與 Dev 兩個 channel,有控制組與兩個新的演算法:

Google Chrome installs, on Dev and Canary channels, and on all platforms except iOS, were randomly assigned to one of three groups: control (30%), CECPQ2 (30%), or CECPQ2b (30%). (A random ten percent of installs did not take part in the experiment so the numbers only add up to 90.)

這兩個演算法有優點也有缺點。一個是 key 比較小,但運算起來比較慢 (SIKE,CECPQ2b);另外一個是 key 比較大,但是運算比較快 (HRSS,CECPQ2):

For our experiment, we chose two algorithms: isogeny-based SIKE and lattice-based HRSS. The former has short key sizes (~330 bytes) but has a high computational cost; the latter has larger key sizes (~1100 bytes), but is a few orders of magnitude faster.

We enabled both CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE/p434 + X25519) key-agreement algorithms on all TLS-terminating edge servers.

感覺還是會繼續嘗試,因為這兩個演算法的缺點都還是有點致命...