微軟的 Outlook 系統會自動點擊信件內的連結

前幾天在 Hacker News Daily 上翻到的,微軟的 Outlook 系統 (雲端上的系統) 會自動點擊信件內的連結,導致一堆問題:「“Magic links” can end up in Bing search results — rendering them useless.」,在 Hacker News 上的討論也有很多受害者出來抱怨:「“Magic links” can end up in Bing search results, rendering them useless (medium.com/ryanbadger)」。

原文的標題寫的更批評,指控 Outlook 會把這些 link 丟到 Bing 裡面 index,這點還沒有看到確切的證據。

先回到連結被點擊的問題,照文章內引用的資料來看,看起來是 2017 年開始就有的情況:「Do any common email clients pre-fetch links rather than images?」。

As of Feb 2017 Outlook (https://outlook.live.com/) scans emails arriving in your inbox and it sends all found URLs to Bing, to be indexed by Bing crawler.

在 Hacker News 上的討論也提到了像是 one-time login email 的機制也會因此受到影響,被迫要用比較費工夫的方法讓使用者登入 (像是給使用者 one-time code 輸入,而不是點 link 就可以登入)。

先記起來,以後在設計時應該會遇到,要重新思考 threat model...

搜尋引擎的替代方案清單

看到「A look at search engines with their own indexes」這篇在介紹各個搜尋引擎,作者設計了一套方法測試,另外在文章裡面也給了很多主觀的意見,算是很有參考價值的,可以試看看裡面提出來的建議。

另外在 Hacker News 上也有討論可以參考:「A look at search engines with their own indexes (2021) (seirdy.one)」。

在文章開頭的「General indexing search-engines」這個章節,先列出三大搜尋引擎 GBY (GoogleBingYandex),以及使用這三家當作後端資料庫的搜尋引擎,可以看到到處都是 Bing 的影子。

接著作者推薦 Mojeek 這個作為 GBY 的替代方案:

Mojeek: Seems privacy-oriented with a large index containing billions of pages. Quality isn’t at GBY’s level, but it’s not bad either. If I had to use Mojeek as my default general search engine, I’d live. Partially powers eTools.ch. At this moment, I think that Mojeek is the best alternative to GBY for general search.

在「Smaller indexes or less relevant results」這邊也有一些方案,像是這個章節第一個提到的 Right Dao,作者就給他了不錯的評價:

Right Dao: very fast, good results. Passes the tests fairly well. It plans on including query-based ads if/when its user base grows.

接下來的「Smaller indexes, hit-and-miss」與「Unusable engines, irrelevant results」也可以翻一下,看看作者怎麼批評 XD

然後是後面的「Semi-independent indexes」就出現了最近幾個比較有名的,像是 Brave Search 與目前我在用的 Kagi

整理的相當不錯...

ARC (Authenticated Received Chain)

標題的 ARC 是指 Authenticated Received Chain,是前陣子在 Hacker News 上看到「Gmail accepts forged YouTube emails (john-millikin.com)」這篇才發現的東西,原文在「Gmail accepts forged YouTube emails」這邊。

作者發現 Gmail 收了從不是直接從 YouTube 發出來的信件:

主要的原因是,Gmail 除了使用標準的 SPFDKIM 判斷外,還吃上面提到的 ARC。

查了一下 ARC,標準是 RFC 8617,目前還是被標成 experimental,主打是解決 forwarding 的問題,看了一下作者群是 LinkedIn (Microsoft)、GoogleValimail

ARC 這東西與之前 Google 在強推的 AMP (然後被罰) 以及現在在推的 Signed HTTP Exchanges 都有相同的味道,無視 security & privacy concern 的東西...

HTTPS Everywhere 將在明年一月 (2023/01) 停止運作

在「Set Up HTTPS by Default in Your Browser」這頁看到的東西:

Note: HTTPS Everywhere will sunset in January 2023.

我把他 submit 到 Hacker News 上:「HTTPS Everywhere will sunset in January 2023 (eff.org)」,裡面有一些有趣的討論,像是這跟 Google 硬幹 Manifest v2 也有關:

It doesn't seem to be mentioned by the EFF, but coincidentally, January 2023 is when Manifest v2 extensions stop working in Google Chrome: https://developer.chrome.com/blog/mv2-transition/

但查了一下,目前好像沒有好的技術標準可以確保第一次的 HTTPS request。馬上想的到的是透過 DNS 的方式指定,這樣就可以透過 DNSSEC 保護不被竄改,但看起來沒有這個標準...

Amazon Echo 會「聽」並且將資料送到第三方廣告平台

前陣子看到的研究報告,證實 Amazon Echo 會聽取資訊並且將這些資料送到第三方的廣告平台上 (會送到 Amazon 自家應該不算新聞):「Your Echos are Heard: Tracking, Profiling, and Ad Targeting in the Amazon Smart Speaker Ecosystem」。

先從 abstract 開始看,主要是目前這些 smart speaker 基本上沒有透明度,所以十位作者群們建立了一套評估用的 framework 用來測試各家 smart speaker 資訊蒐集已經影響的情況:

Smart speakers collect voice input that can be used to infer sensitive information about users. Given a number of egregious privacy breaches, there is a clear unmet need for greater transparency and control over data collection, sharing, and use by smart speaker platforms as well as third party skills supported on them. To bridge the gap, we build an auditing framework that leverages online advertising to measure data collection, its usage, and its sharing by the smart speaker platforms.

這次論文裡面提到的目標就是 Amazon Echo 會將聽到的東西分享給第三方的廣告平台,並且讓廣告平台可以調整競價 (賺更多的錢),而且這些資訊並沒有被揭露在 privacy policy 裡面:

We evaluate our framework on the Amazon smart speaker ecosystem. Our results show that Amazon and third parties (including advertising and tracking services) collect smart speaker interaction data. We find that Amazon processes voice data to infer user interests and uses it to serve targeted ads on-platform (Echo devices) as well as off-platform (web). Smart speaker interaction leads to as much as 30X higher ad bids from advertisers. Finally, we find that Amazon's and skills' operational practices are often not clearly disclosed in their privacy policies.

幾個比較重要的資訊,其中一個是「Network traffic distribution by persona, domain name, purpose, and organization」:

另外一個重點是哪些 3rd-party:

就心裡有個底,然後隔壁棚也有類似產品 (主業是做廣告的那家),大概要跑不掉...

Tor 0.4.7.7 支援 congestion control

Tor 首度在協定內支援了 congestion control:「Congestion Control Arrives in Tor 0.4.7-stable!」。

這個新功能會帶來效能的提昇:

Tor has released 0.4.7.7, the first stable Tor release with support for congestion control. Congestion control will eliminate the speed limit of current Tor, as well as reduce latency by minimizing queue lengths at relays. It will result in significant performance improvements in Tor, as well as increased utilization of our network capacity.

之所以沒有辦法直接利用 packet loss 的方式讓 TCP network stack 直接判斷 congestion control,是因為這樣會產生 side channel:

Crucially, we rejected mechanisms to provide congestion control by allowing packet drops, due to the ability to introduce end-to-end side channels in the packet drop pattern.

所以 Tor 得自己實做 congestion control 演算法,選擇的演算法是結合了 Vegas 的 Tor-Vegas,可以看到在實驗中,德國與香港的 exit node 效率大幅提昇:

另外也因為 0.4.7.7 也出來一個禮拜了,也可以看到 Advertised Bandwidth (算是 Tor network 觀察到的 bandwidth) 開始成長:

另外一個重要的點是 UDP 的支援計畫,看起來在這次改善後也比較有可行性了:

The astute reader will note that we rejected datagram transports. However, this does not mean that Tor will never carry UDP traffic. On the contrary, congestion control deployment means that queue delay and latency will be much more stable and predictable. This will enable us to carry UDP without packet drops in the network, and only drop UDP at the edges, when the congestion window becomes full. We are hopeful that this new behavior will match what existing UDP protocols expect, allowing their use over Tor.

Google 在歐盟的服務將提供 Reject All Cookies 的按鈕

看到「Google gives Europe a ‘reject all’ button for tracking cookies after fines from watchdogs」這篇,在講 Google 在歐盟的服務開始提供 Reject All Cookie 的按鈕,其中 Google 官方的公告可以在「New cookie choices in Europe」這邊看到。

Reject All Cookies 的按鈕是像這樣的設計:

照報導說的,今年初的時候法國罰了 Google 一億五千萬歐元,因為 Accept All Cookies 只要一個按鍵,但 Reject All Cookies 需要按很多選單才能達成,而法國認為這樣非對稱式的設計是違法的:

Earlier this year, France’s data protection agency CNIL fined Google €150 million ($170 million) for deploying confusing language in cookie banners. Previously, Google allowed users to accept all tracking cookies with a single click, but forced people to click through various menus to reject them all. This asymmetry was unlawful, said CNIL, steering users into accepting cookies to the ultimate benefit of Google’s advertising business.

Google 的說明裡面也有提到法國的事情,但當然沒有提到罰款:

Based on these conversations and specific direction from France’s Commission Nationale de l’Informatique et des Libertés (CNIL), we have now completed a full redesign of our approach, including changes to the infrastructure we use to handle cookies.

另外就是這個功能目前只在法國啟用,後續會放到整個歐盟區:

We’ve kicked off the launch in France and will be extending this experience across the rest of the European Economic Area, the U.K. and Switzerland.

處理 EasyPrivacy 讓 bookwalker.com.tw 無法購買書籍的問題

有時候在 www.bookwalker.com.tw 上面買書會出現地區問題:

先講解法再講找問題的過程以及解釋原因好了,把這行白名單設定加進 uBlock Origin 的 My filters 列表裡面就可以了:

@@||www.cloudflare.com/cdn-cgi/trace$xhr,domain=www.bookwalker.com.tw

應該有些人看到上面敘述的問題,以及白名單的設法,就大概知道發生什麼事情了,不過這邊還是會從頭說明除錯的過程。

我一開始先確認在無痕模式下是可以看到的 (也就是在沒有延伸套件的情況下),如果是這類 case,最常見的就是延伸套件的鍋,所以接下來研究是哪個套件造成的;然後透過經驗,從容易中獎的套件開始關,一路抓下來就可以抓到是 uBlock Origin。

然後把 uBlock Origin 裡面的 Filter lists 裡面開始關,一路關就可以測出來是 EasyPrivacy 這組造成的。

然後就是拉 uBlock Origin 提供的 Logger 去看 EasyPrivacy 擋了什麼,可以看到有這條:

||cloudflare.com/cdn-cgi/trace$3p,domain=~isbgpsafeyet.com|~wyndhamdestinations.com

而熟知 Cloudflare 的人應該就知道,在 https://www.cloudflare.com/cdn-cgi/trace 裡面有 geolocation 資訊,這樣看起來應該是被 bookwalker.com.tw 拿來跑地區資訊 XD

有興趣的也可以自己跑 curl -v https://www.cloudflare.com/cdn-cgi/trace 看傳回結果。

不過這個部份居然做在前端 javascript,看起來... 好像... 可以...?

Ptt 正式關閉 Telnet 協定

在「[公告] Ptt 即日起關閉無加密的 telnet 連線方式」這邊看到正妹站長公佈關閉 Telnet 協定了,所以目前應該只剩下 WebSocketSSH 了,兩個都有走加密協定...

偶而還是會拿 Telnet 來用,像是測試 SMTP server 的時候,而 SMTP 也是少數現在大家頭痛的協定 XDDD

銀河的歷史又翻過了一頁...

觀察誰在存取剪貼簿的工具 (X11 下)

兩個月前在 Hacker News 上看到的討論,有人想要知道誰在 X11 下存取剪貼簿:「Who keeps an eye on clipboard access? (ovalerio.net)」,原文在「Who keeps an eye on clipboard access?」這邊,作者用 Python 寫的程式則是在「clipboard-watcher」這邊。

馬上有想到 iOS 在 2020 年推出的機制:「iOS 14 clipboard notifications are annoying, but developer adoption of a new API will improve the experience」。

不過在 X11 上跑起來會發現冒出來的資訊量有點大,像是在瀏覽器操作 WordPress 寫文章時剪剪貼貼的時候就會狂噴,如果可以提供程式的白名單的話就更好了,畢竟是我直接把 clipboard API 裡讀取的功能直接拔掉 (但網站還是可以寫進去就是了),對我來說不會在意 browser 寫進去的情況:

另外程式有時候會卡住 (尤其是遇到圖片的剪輯時),算是 bug 吧...

然後 Hacker News 的討論串裡面有人提到一個有趣的設計,他希望限制那些不在焦點上面的程式去碰 clipboard:

By far the worst offense I've seen in clipboard privacy on the Linux desktop is RedHat's virt-manager. It sends your clipboard AND selection content to all virtual machines, even when they are not focused, with no indication that it's happening, and with no GUI option to turn it off. This is at odds with the common practice of running untrusted code in virtual machines.

這個想法好像不賴,理論上 clipboard 應該是在有互動的時候才會碰到的東西...