Apple 新的「Find My」帶來的隱私問題

這次 WWDC 推出的新功能,已經有人在討論機制與隱私問題了:「How does Apple (privately) find your offline devices?」。

前一代的「Find my iPhone」需要透過網路與 GPS 資料才能在系統上看到,這一代則是加上 BLE beacon,然後任何一台 iOS device 收到後就回傳回給蘋果:

Every active iPhone will continuously monitor for BLE beacon messages that might be coming from a lost device. When it picks up one of these signals, the participating phone tags the data with its own current GPS location; then it sends the whole package up to Apple’s servers.

幾個隱私問題在於,代傳的 iOS device 也會暴露位置資訊給蘋果,另外收到 BLE beacon 的 iOS device 本身是否可以解讀遺失機器的資訊?而商家看起來也可以利用這個方式主動發送攻擊而得知不少資料 (像是文章裡提到先前蘋果透過 randomize mac address 加強隱私的問題,這邊又多開了一個洞),現在蘋果給的資訊還不夠清楚,需要真的逆向工程確認才知道...

GrabFood 用定位資料修正餐廳的資訊

Grab 的「How we harnessed the wisdom of crowds to improve restaurant location accuracy」這篇是他們的 data team 整理出來,如何使用既有的資料快速的修正餐廳資訊。裡面提到的方法不需要用到 machine learning,光是一些簡單的統計算法就可以快速修正現有的架構。

這些資訊其實是透過司機用的 driver app 蒐集來的,在 driver app 上有大量的資訊傳回伺服器 (像是定時回報的 GPS 位置,以及取餐狀態),而這些司機因為地緣關係,腦袋裡的資訊比地圖會準不少:

One of the biggest advantages we have is the huge driver-partner fleet we have on the ground in cities across Southeast Asia. They know the roads and cities like the back of their hand, and they are resourceful. As a result, they are often able to find the restaurants and complete orders even if the location was registered incorrectly.

所以透過這些資訊他們就可以反過來改善地圖資料,像是透過司機按下「取餐」的按鈕的地點與待的時間,就可以估算餐聽可能的位置,然後拿這個資訊比對地圖上的資料,就很容易發現搬家但是地圖上沒更新的情況:

Fraction of the orders where the pick-up location was not “at” the restaurant: This fraction indicates the number of orders with a pick-up location not near the registered restaurant location (with near being defined both spatially and temporally as above). A higher value indicates a higher likelihood of the restaurant not being in the registered location subject to order volume

Median distance between registered and estimated locations: This factor is used to rank restaurants by a notion of “importance”. A restaurant which is just outside the fixed radius from above can be addressed after another restaurant which is a kilometer away.

另外也有不少其他的改善 (像是必須在離餐聽某個距離內才能點「取餐」,這個「距離」會因為餐聽可能在室內商場而需要的調整),整個成果就會反應在訂單的取消率大幅下降:

整體看起來是系統產生清單後讓人工後續處理 (像是打電話去店家問?),但這個方式所提供的清單準確度應該很高 (因為司機不會沒事跟自己時間過不去,跑到奇怪地方按下取餐),用這些資料跑簡單的演算法就能夠快速修正不少問題...

依照交通時間,評估各種「地點」的服務...

Oalley 這個服務利用「交通時間」來評估地點,最簡單的應用像是給一個地點與時間,畫出範圍:

或是給多個點以及時間來評估地點:

不過後面是用 OpenStreetMap 的資料,我丟了幾個中文測試好像都找不到,用英文倒是可以...

CloudFront 在東京開到第八個點了...

看到 Amazon CloudFront 宣佈在東京開到第八個點了:「Amazon CloudFront launches eighth Edge location in Tokyo, Japan」。

Amazon CloudFront announces the addition of an eighth Edge location in Tokyo, Japan. The addition of another Edge location continues to expand CloudFront's capacity in the region, allowing us to serve increased volumes of web traffic.

這個成長速度有點驚人,一月才加了兩個,現在又要再加一個... 不過大阪還是維持一個。

南韓出手調查 Google 未經同意蒐集位置資訊的問題了

在「就算關掉 Google 的定位服務也還是會蒐集位置資訊...」這邊提到的蒐集問題,南韓出手調查了:「Regulators question Google over location data」。

Regulators in South Korea summoned Google (GOOGL, Tech30) representatives this week to question them about a report that claimed the company was collecting data from Android devices even when location services were disabled.

英國也在看情況:

U.K. data protection officials are also looking into the matter.

美國與歐盟其他國家反而還沒看到消息... (不過美國有可能是以訴訟的方式進行)

就算關掉 Google 的定位服務也還是會蒐集位置資訊...

就如標題所寫的,Quartz 獨家刊出來的新聞,即使你關掉 Google 的定位服務,Google 還是會蒐集你的位置 (而且跟 Google 發言人確認後也證實):「Google collects Android users’ locations even when location services are disabled」。

而且是全背景作業,在你沒有開定位服務,沒有插 SIM 卡,也沒有跑任何 app,他就會將定位資訊傳出去:

Many people realize that smartphones track their locations. But what if you actively turn off location services, haven’t used any apps, and haven’t even inserted a carrier SIM card?

從今年年初開始這樣搞的,Google 發言人只宣稱這個資料並沒有被用來整合到「network sync system」,並且會立即丟掉 (所以你還是不知道被用到什麼地方):

“In January of this year, we began looking into using Cell ID codes as an additional signal to further improve the speed and performance of message delivery,” the Google spokesperson said in an email. “However, we never incorporated Cell ID into our network sync system, so that data was immediately discarded, and we updated it to no longer request Cell ID.”

這句話的意思其實代表著是丟掉 raw data,改以統計的方式轉移存到其他系統。

另外 John Gruber 在「Google Collects Android Users' Locations Even When Location Services Are Disabled」其實寫的更直接:

If they were “never used or stored”, why did they start collecting them in the first place? This is like a kid caught with their hand in the cookie jar saying they weren’t going to eat any cookies. Sure.

白話一點就是「你當我傻逼啊」。

應該會促進 microG 的發展... (參考「microG 的進展...」)

iOS 11 將 Location 的主權交還給使用者

Hacker News Daily 上看到這則 tweet,說 iOS 11 將會把 Location 的主權交還給使用者控制:

查了對應的一些網站,可以看到好幾個站台都有介紹這一點:「iOS 11 Users to Gain More Control Over Apps' Use of Location Services」、「iOS 11 gives users tighter control over when apps can use their location」。

TechCrunch 標題寫的更直接,其實影響最直接的就是這些 app:「iOS 11 stops apps like Uber and Waze from accessing user location data at all times」。

算是不錯的消息啦... (Android 上則可以看「Background Location Limits」,這邊是 Android O 的新功能...)

Amazon CloudFront 又增加東京機房了...

CloudFront 又增加東京的點了,這是第四個點:「Amazon CloudFront adds new Edge Locations in Tokyo, Japan and Dallas/Fort Worth, Texas」。

This is our fourth edge location in the Tokyo area and our third in Dallas, bringing the total number of CloudFront locations to 87 (including 76 points of presence and 11 regional edge cache locations).

不愧是亞洲的交換中心...

CloudFront 在慕尼黑加到第五個機房...

德國是歐洲區重要的交換中心,不過量大到可以讓 CloudFront 加到第五個點就頗意外的:「Announcing New Munich Edge Location for Amazon CloudFront, our 7th Edge Location in Germany」。

慕尼黑的第五個點,全德國第七個點:

The Munich location is our third location in Germany (joining Frankfurt and Berlin), and our 7th edge location in Germany bringing the total number of worldwide edge locations to 69.

Nginx + FastCGI + Trac

先前試著逼自己用 Phabricator,用了一個多月後發現設計的邏輯還是跟 Trac 差了不少,算是為了 Facebook 特化的產品吧。在這一個月查資料的過程也發現當初 Wikimedia 要採用的時候也花了不少力氣送 patch 回官方,然後針對不少地方客製化調整。

另外比較痛的地方是 plugin 的支援能力還沒有很好,變成很多東西都要改主體... 而且效能也不太好 (不支援 PHP 7.0 還蠻痛的),在比較低階的 VPS 上跑特別明顯。

這幾天花了點時間把 Trac 給架起來,之前都是用 FreeBSD ports 架,但已經愈來愈沒有再接觸 FreeBSD 了,所以這次在 Ubuntu 上用 pyenv 裝起來再用 pip 裝起來。

另外一個跟之前不同的,是先前都用 Apachemod_wsgi,在低階的 VPS 上則是要找省資源的方案,這次則是用 nginx + FastCGI 去接,比起之前複雜不少...

最主要是參考了官方的文件以及「Gentoo下使用nginx+fastcgi部署trac」這邊的說明達到效果,重點是這段 location 的設定:

    location / {
        auth_basic "trac realm";
        auth_basic_user_file /srv/domain.example.com/.htpasswd;

        include fastcgi.conf;
        fastcgi_param AUTH_USER $remote_user;
        fastcgi_param HTTPS on;
        fastcgi_param PATH_INFO $fastcgi_script_name;
        fastcgi_param REMOTE_USER $remote_user;
        fastcgi_param SCRIPT_NAME "";
        fastcgi_pass unix:/var/run/trac/trac.sock;
    }

    location /.well-known/acme-challenge/ {
        alias /var/www/dehydrated/;
    }

網路上有找到用 location ~ (/.*) 去 match,然後拉出 $1PATH_INFO 用的,這這會使得這段 location 的優先權太高 (參考官方對於 location 的順序說明),而蓋掉下面 Let's Encrypt 的 acme challenge 過程,所以還是得這樣搞。

另外是自己一個人用,就用 .htpasswd 的方式認證了,沒必要弄 LDAP 之類的認證...

接下來就是裝一堆 plugin 並且調整 css/js 與 SQL query 了...