Slack 要拿使用者資料訓練 AI

在「Slack AI Training with Customer Data (slack.com)」這邊看到的,原公告在「Privacy Principles: Search, Learning and Artificial Intelligence」這邊。

預設會被丟進去訓練,Opt-out 無法直接設定,需要透過 e-mail 寫信找 feedback@slack.com (yeah,dark pattern):

Contact us to opt out. If you want to exclude your Customer Data from Slack global models, you can opt out. To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line “Slack Global model opt-out request.” We will process your request and respond once the opt out has been completed.

拿企業資料來搞事嗎... 這應該已經不只是 privacy 議題而是 security 層面了,PR 層面鐵定很難看,來看後面會不會轉彎?

當年 Facebook 透過 VPN 記錄使用者活動細節的目的

2019 年年初的時候 TechCrunch 爆出 Facebook 透過付錢給使用者,透過 VPN (以及安裝 Root CA) 記錄使用者的行為:「Facebook 花錢向使用者購買他們的行為記錄」,最近揭露的文件透漏了當年的目的:「Facebook snooped on users’ Snapchat traffic in secret project, documents reveal」。

TC 這邊的文章裡面沒看到信件,另外找了其他報導:「Project Ghostbusters: Facebook Accused of Using Your Phone to Wiretap Snapchat」,裡面有兩份資料是信件往來的部分:「Document 735」、「Document 736」。

裡面可以看到想要取得 SnapchatYouTubeAmazon 這些使用行為:

The goal of Facebook’s SSL bump technology was the company’s acquisition, decryption, transfer, and use in competitive decisionmaking of private, encrypted in-app analytics from the Snapchat, YouTube, and Amazon apps, which were supposed to be transmitted over a secure connection between those respective apps and secure servers (sc-analytics.appspot.com for Snapchat, s.youtube.com and youtubei.googleapis.com for YouTube, and *.amazon.com for Amazon). Id.

然後信裡還有提到是用 Squid 實作的:

Today we are using the Onavo vpn-proxy stack to deploy squid with ssl bump the stack runs in edge on our own hosts (onavopp and onavolb) with a really old version of squid (3.1).

這次的訴訟裡提到了 18 U.S. Code § 2511 - Interception and disclosure of wire, oral, or electronic communications prohibited,看起來會是聯邦層級的刑事案件...

那是個還不流行 certificate pinning 的年代...?

Let's Encrypt 簽發新的 Intermediate CA

Let's Encrypt 宣佈簽發新的 Intermediate CA:「New Intermediate Certificates」。

這次用 ISRG Root X1 簽了很多東西出來:

On Wednesday, March 13, 2024, Let’s Encrypt generated 10 new Intermediate CA Key Pairs, and issued 15 new Intermediate CA Certificates containing the new public keys.

ISRG Root X1 簽了五組 2048-bit RSA 的 intermediate CA,被叫做 R10~R14:

We created 5 new 2048-bit RSA intermediate certificates named in sequence from R10 through R14. These are issued by ISRG Root X1. You can think of them as direct replacements for our existing R3 and R4 intermediates.

另外 ISRG Root X1 也簽出五組 P-384 ECDSA 的 intermediate CA,被叫做 E5~E9;另外 ISRG Root X2 也簽了 E5~E9:

We also created 5 new P-384 ECDSA intermediate certificates named in sequence from E5 through E9. Each of these is represented by two certificates: one issued by ISRG Root X2 (exactly like our existing E1 and E2), and one issued (or cross-signed) by ISRG Root X1.

所以總共是產生了 10 組 intermediate certificate,然後簽了 15 組 intermediate CA 出來。

另外這邊有個比較特別的是 ISRG Root X1 (RSA 4096) 也簽了 ISRG Root X2 (ECDSA P-384),理論上 ISRG Root X2 這組後續應該也會開始放到各家的 root store 裡面...

用官方的圖可以說明這些關係:

目前還沒上線,先簽出來並且公告,後續才會切換過去。

另外在紋章裡面提到了 app 應該避免對 intermediate certificate 鎖定 (key pinning):

We are very hopeful that these steps will prevent intermediate key pinning altogether, and help the WebPKI remain agile moving forward.

Intermediate CA 在安全理由上是需要定時更換的,真的要做的話,應該是對 Root CA 做比較好。

拔掉 Android 上面無意義的軟體:Universal Android Debloater GUI

HN 上看到「Debloat non-rooted Android devices (github.com/universal-debloater-alliance)」這個討論,原網頁是 GitHub 專案 Universal Android Debloater GUI 這個。

說明的地方還蠻清楚的,透過 ADB 在不需要 root 的情況下試著把垃圾軟體清掉:

Cross-platform GUI written in Rust using ADB to debloat non-rooted android devices. Improve your privacy, the security and battery life of your device.

專案看起來跑一段時間了,從 releases 頁這邊可以直接下載 binary 執行。

FAQ 頁裡面的「What are the ADB commands used by UAD?」也有列出用到的指令,如果不想用這套軟體的話也可以自己下指令移除。

首頁上有列出支援的廠牌,看起來還不少,拿找個時間清一下手上的 Android 手機...

回顧 Let's Encrypt 將在六月停止 cross-signed chain 的消息

因為收到 Cloudflare 的信,關於 Let's Encrypt 的 cross-signed chain 將在今年九月底過期的計畫,Cloudflare 這邊也有一些配合的措施會進行:

Let’s Encrypt announced that the cross-signed chain is set to expire on September 30th, 2024. As a result, Cloudflare will stop issuing certificates from the cross-signed CA chain on May 15th, 2024.

去年七月的時候 Let's Encrypt 拿的是去年五月底的資料說明 (2023/05/31),這邊會看 Android 7.1+ 的佔比,當時到了 93.9%。

會看 Android 7.1 是因為從這個版本開始預設就有內建 ISRG Root X1,而不需要 IdenTrust 的 cross-sign chain 了:

剛剛開了 Android Studio 來看,最近一次更新 Android 市占率的資料是去年十月初 (2023/10/01),到 95.0% 了:

也許到九月底的時候有 97%+ 甚至 98%+ coverage,但 Android 的基數還是太大,就算到 98%+ coverage,預期到時候的影響應該還是不小,會不會再簽一年...?

Web Audio API 當做 fingerprint 的方式

三年前的文章「How the Web Audio API is used for audio fingerprinting」講解了 AudioContext 是怎麼被拿來 fingerprint 的,最近在「How We Bypassed Safari 17's Advanced Audio Fingerprinting Protection」這篇看到的。

AudioContext 可以完全跟錄音設備無關,單純計算,然後因為不同瀏覽器實作上面有差異,就被拿來當作 fingerprint 了。

文章裡介紹的方法是透過 Oscillator 產生 440Hz 的正弦波,然後過 Compressor 降低音量 (運算):

The Web Audio API provides a DynamicsCompressorNode, which lowers the volume of the loudest parts of the signal and helps prevent distortion or clipping.

降低音量的運算再這塊各家的實作不同,就能夠區分不同的瀏覽器 (甚至是版本):

Historically, all major browser engines (Blink, WebKit, and Gecko) based their Web Audio API implementations on code originally developed by Google in 2011 and 2012 for the WebKit project.

Since then browser developers have made a lot of small changes. These changes, compounded by the large number of mathematical operations involved, lead to fingerprinting differences. Audio signal processing uses floating point arithmetic, which also contributes to discrepancies in calculations.

Additionally, browsers use different implementations for different CPU architectures and OSes to leverage features like SIMD. For example, Chrome uses a separate fast Fourier transform implementation on macOS (producing a different oscillator signal) and other vector operation implementations on different CPU architectures (used in the DynamicsCompressor implementation). These platform-specific changes also contribute to differences in the final audio fingerprint.

而這東西平常也不會用到,所以對 Tor Browser 這種特別重視 privacy 的瀏覽器就直接關掉他了:

Tor

In the case of the Tor browser, everything is simple. But unfortunately, web Audio API is disabled there, so audio fingerprinting is impossible.

停止使用 Spamhaus DNSBL

剛剛看到「If you query Spamhaus Projects’ legacy DNSBLs via DigitalOcean move to the free Data Query Service」這篇,覺得愈來愈詭異了,研究了目前的情況後決定停用 Spamhaus

現在已經愈來愈少自己架設 mail server 了,不過我自己還是留了幾個 domain 跑在自己架設的 Postfix 上面,最主要是 command line 下面用 Mutt 讀信還是蠻方便的,另外一方面是確保一個信箱是不受到大企業的管制。

如果不是拿套裝軟體直接架設的話,自己架設 mail server 會有不少東西要設定:在 MTA 這端通常會使用 DNSBL 擋掉已知會發 spam 的 IP address。

DNSBL 的原理不難,就是拿 IPv4 address 組合一個 hostname,透過 DNS 查詢就會知道這個 IPv4 address 是否在清單;換句話說,就是拿 DNS protocol 當作 API,當作資料庫查詢。

舉個例子來說,假設我要查 188.235.18.134 這個位置的情況 (從「Worst /24 blocks based on total spam count」這邊翻出來的),這邊使用 SpamCop 的清單,我先把 IPv4 address 反過來變成 134.18.235.188,然後再加上 SpamCop 所指定的 bl.spamcop.net,變成 134.18.235.188.bl.spamcop.net,接下來查詢就可以查到:

134.18.235.188.bl.spamcop.net has address 127.0.0.2

如果是 168.95.1.1 的話,同樣方法組合成 1.1.95.168.bl.spamcop.net 可以看到沒有在 SpamCop 清單內:

Host 1.1.95.168.bl.spamcop.net not found: 3(NXDOMAIN)

這邊選擇用 DNS 的好處包括了 DNS resolver 及 DNS library 自然的 cache,不需要 Postfix 這類 MTA 再自己實作 cache 層,對於有大量信件 (無論是正常的或是 spam) 進來的時候也不會造成提供清單的服務大量的負載。

回頭來說 Spamhaus 的情況,他們公告要擋 DigitalOcean 的理由很奇怪,因為 DigitalOcean 架設了自己的 mirror 所以他們不知道使用的量,要使用者去 Spamhaus 上註冊申請後拿到一個自己的 your_DQS_key.zen.dq.spamhaus.net 使用。

有了 unique key 在 query,這樣就給了 Spamhaus 很清晰追蹤資料,加上 Privacy Policy 裡面的資訊:

We may have to share your personal data with the parties set out below for the purposes set out in the table in paragraph 4.
[...]
– Third parties to whom we may choose to sell, transfer, or merge parts of our business or our assets. Alternatively, we may seek to acquire other businesses or merge with them. If a change happens to our business, then the new owners may use your personal data in the same way as set out in this privacy notice.

這樣就知道他們想要做什麼了...

另外一方面,查資料的時候發現他們已經擋掉 Google Public DNS 以及 Cloudflare DNS

這是我自己架設 Unbound 的查詢:

gslin@home [~] [05:09/W4] host 2.0.0.127.zen.spamhaus.org
2.0.0.127.zen.spamhaus.org has address 127.0.0.10
2.0.0.127.zen.spamhaus.org has address 127.0.0.4
2.0.0.127.zen.spamhaus.org has address 127.0.0.2

這是 Google Public DNS (8.8.8.8):

gslin@home [~] [05:09/W4] host 2.0.0.127.zen.spamhaus.org 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases: 

Host 2.0.0.127.zen.spamhaus.org not found: 3(NXDOMAIN)

這是 Cloudflare DNS (1.1.1.1):

gslin@home [~] [05:09/W4] host 2.0.0.127.zen.spamhaus.org 1.1.1.1
Using domain server:
Name: 1.1.1.1
Address: 1.1.1.1#53
Aliases: 

2.0.0.127.zen.spamhaus.org has address 127.255.255.254

在 Spamhaus 的「Frequently Asked Questions (FAQ)」這篇裡面有提到 127.255.255.254 的回應是「Query via public/open resolver」:

127.255.255.252	Any	Typing error in DNSBL name
127.255.255.254	Any	Query via public/open resolver
127.255.255.255	Any	Excessive number of queries

所以還蠻清楚這家的東西已經不能碰了...

從 e-mail 取得電話號碼

Hacker News 上看到 2019 年的文章:「From email to phone number, a new OSINT approach (2019) (martinvigo.com)」,原文在「From email to phone number, a new OSINT approach」。

用的原理是每一家在 recovery 時都會透漏電話號碼的不同部位,從截圖可以看到像是 PayPal 給的是區碼地一碼加上後三碼:

然後他整理出來:

Leaks first three and last two digits:
eBay

Leaks first and last four digits:
Paypal

Leaks first and last two digits:
Yahoo

Leaks last four digits:
Lastpass

Leaks last two digits:
Google
Facebook
Twitter
Hotmail
Steam

接著文章裡面介紹了其他的方法再縮小可能性,然後再反過來利用電話號碼查 e-mail,像是 Amazon

後面則是示範了這整套過程可以自動化。

可以確認電話號碼後可以做的事情就很多了,文章裡面提到的也只是一小塊...

Signal 的簡訊花費在 $6m/year

Signal 的「Privacy is Priceless, but Signal is Expensive」這篇 PR 稿裡面提到了各項支出,Hacker News 上的討論在「Privacy is priceless, but Signal is expensive (signal.org)」這邊可以翻到。

裡面可以看到目前的數字 (以 2023 年十一月推算):

Storage: $1.3 million dollars per year.
Servers: $2.9 million dollars per year.
Registration Fees: $6 million dollars per year.
Total Bandwidth: $2.8 million dollars per year.
Additional Services: $700,000 dollars per year.

Current Infrastructure Costs (as of November 2023): Approximately $14 million dollars per year.

我比較感興趣的有幾塊,一個是標題提到的簡訊,在「Registration Fees」這個段落的說明裡可以看到列了兩個項目,一個是下載 Signal 的費用,另外一個是簡訊 SMS 認證的費用:

Signal incurs expenses when people download Signal and sign up for an account, or when they re-register on a new device. We use third-party services to send a registration code via SMS or voice call in order to verify that the person in possession of a given phone number actually intended to sign up for a Signal account.

這邊有些要確認的,下載軟體的頻寬應該是包括在 Total Bandwidth... 而且推敲起來,金額應該不算大:

  • 手機上的應用應該是由 AppleApp StoreGooglePlay 平台提供,不需要 Signal 提供頻寬下載。
  • 桌面應用端的部分,無論是 Windows、Mac 還是 Linux 的平台,看起來是透過 updates.signal.org 下載,這個名稱目前是指到 Cloudflare 上面,透過 traceroute 看起來不是 premium account (HiNet 用戶是導去美西的 SFO 機房),也許是 Cloudflare 的贊助帳號?

所以我會先假設這邊 $6m/year 的費用應該都是 SMS,在後面這段看起來也有提出來:

The cost of these registration services for verifying phone numbers when people first install Signal, or when they re-register on a new device, currently averages around $6 million dollars per year.

另外會這麼高也是因為現在 SMS pumping 很流行,也就是攻擊者與電信商合作 (或是同一組人),透過假造大量的認證需求,讓 app 後面的公司需要付大量的簡訊費用:

另外一個感興趣的是頻寬的部分,裡面有提到有一個比較吃頻寬的項目,是處理不在通訊錄上面的通話或是視訊。這邊 Signal 為了避免 IP address 的洩漏,會避免直接讓兩邊接通,而是透過 relay 接通:

To take one example, Signal always routes end-to-end encrypted calls from people who aren’t in your contacts through a relay server that obscures IP address information.

光這部份大約是 20PB/year 的量,費用約 $1.7m/year (上面有提到整個頻寬費用約 $2.8m/year):

At current traffic levels, the amount of outbound bandwidth that is required to support Signal voice and video calls is around 20 petabytes per year (that’s 20 million gigabytes) which costs around $1.7 million dollars per year in bandwidth fees just for calling[.]

而最大的費用還是各種人事支出的部分 (i.e. 經濟規模還沒有大到反過來),佔 $19m/year:

In total, around 50 full-time employees currently work on Signal[.]

To sustain our ongoing development efforts, about half of Signal’s overall operating budget goes towards recruiting, compensating, and retaining the people who build and care for Signal. When benefits, HR services, taxes, recruiting, and salaries are included, this translates to around $19 million dollars per year.

在 Hacker News 上有人貼了「Signal Technology Foundation - Nonprofit Explorer - ProPublica」這個,這邊有申報資料可以翻,比 PR 稿上面細。

德國法院認為 DNT header 具有法律的告知效力

HN 上面看到「German court declares Do Not Track to be legally binding (vzbv.de)」這個消息,原文是德文:「Gericht untersagt Datenschutzverstöße von LinkedIn」,Google Translation 翻譯的結果:「Court bans LinkedIn data protection violations」。

LinkedIn 告知使用者他們不會理會 DNT,德國法院則是認為 DNT header 是已經告知對方不願意被追蹤了:

„Wenn Verbraucher:innen die ,Do-Not-Track‘-Funktion ihres Browsers aktivieren, ist das eine klare Botschaft: Sie wollen nicht, dass ihr Surfverhalten für Werbe- und andere Zwecke ausgespäht wird“, sagt Rosemarie Rodden, Rechtsreferenin beim vzbv. „Webseitenbetreiber müssen dieses Signal respektieren.“

“When consumers activate the 'Do Not Track' function of their browser, it sends a clear message: They do not want their surfing behavior to be spied on for advertising and other purposes,” says Rosemarie Rodden, legal officer at vzbv. “Website operators must respect this signal.”

這好像是第一次看到 DNT 相關的法律判決?可以看看後續有沒有新的消息 (上訴之類的),來看看最終的判決會是怎麼樣。