避開人臉辨識系統的演算法

Hacker News Daily 上看到的專案,針對現在很多演算法可以抓出照片上的人臉進行防禦:「Image "Cloaking" for Personal Privacy」。

這算是 Evasion 的應用,這個專案想要提供演算法,可以在照片上「隱形」,使得演算法偵測不到「人臉」,程式碼可以在 Shawn-Shan/fawkes 這邊翻到,可以看到是在 Python 上用 TensorFlowKeras 實做出來的。

不過會覺得比較有趣的反而不是裡面的方法,而是這篇論文的六個作者:

Shawn Shan†, PhD Student
Emily Wenger†, PhD Student
Jiayun Zhang, Visiting Student
Huiying Li, PhD Student
Haitao Zheng, Professor
Ben Y. Zhao, Professor

† Project co-leaders and co-first authors

從名字上來看五個是華人,而且一路搜下來會發現掛在最後一位的 Ben Y. Zhao 教授在 Quora 上常常回答問題,而且這些問題 (與回答) 還蠻有趣的,可以自己搜看看...

強制每個 Git Repository 都要設定使用者資訊

看到「Setting Up Git Identities」這篇,裡面提到的方法可以解決 Git 裡有多個身份時常見的用錯身份的問題...

個人的 Git repository 會希望用自己的 email address,而公司的 Git repository 則是希望用公司的 email address,但 Git 預設會使用 username 與 hostname 組一個出來,所以常常是推到公司的機器上後才發現 Git repository 沒設定公司的 email address...

上面提到的文章就是關掉 Git 預設會組合的行為,於是就會記得要設定了:

git config --global user.useConfigOnly true

然後記得要把全域設定裡的 nameemail 拔掉,另外有些人可能會掛上 signingkey 也一起拔掉:

git config --global --unset user.name
git config --global --unset user.email
git config --global --unset user.signingkey

這樣當沒設定時想要 git commit,就會被擋下來要求你提供,就能避免把自己的 email address 混在公司的 Git repository 裡面了...

Brave 出手檢舉 Google 沒有遵守 GDPR

Brave (從 Chromium 分支出來的瀏覽器) 檢舉 Google 沒有遵守 GDPR 的規定:「Formal GDPR complaint against Google’s internal data free-for-all」。

主要是「purpose limitation」這個部份,出自「REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016」:

1. Personal data shall be:

(b)

collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);

比較重要的是 specified 與 explicit 這兩個詞,GDPR 規定必須明確指明用途,而可以從整理出來的文件「Inside the black box」裡的「Purported processing purpose」看到大量的極為廣泛的說明。

Google 應該會就這塊反擊認為這樣的描述就夠用,就看歐盟決定要怎麼做了...

PostgreSQL 上去識別化的套件

在「PostgreSQL Anonymizer 0.5: Generalization and k-anonymity」這邊看到的套件,看起來可以做到一些常見而且簡單的去識別化功能:

The extension supports 3 different anonymization strategies: Dynamic Masking, In-Place Anonymization and Anonymous Dumps. It also offers a large choice of Masking Functions: Substitution, Randomization, Faking, Partial Scrambling, Shuffling, Noise Addition and Generalization.

看起來可以把欄位轉成 range 這件事情半自動化處理掉 (還是需要 SQL 本身呼叫這些函數),之後遇到 PII 的時候也許會用到...

開源的語音助理 Leon

目前的語音助理都會把資訊傳回到中央伺服器,而 leon-ai/leon 則是希望在本地端解決:

Leon is an open-source personal assistant who can live on your server.

He does stuff when you ask him for.

You can talk to him and he can talk to you. You can also text him and he can also text you. If you want to, Leon can communicate with you by being offline to protect your privacy.

作者錄了一段目前版本的功能,可以看到語音輸入與分析運作的還不錯:

看到 1.0.0 beta,最近應該會釋出正式版,接下來就是看可以掛什麼 plugin 進去讓他更好用?

各家 Session Replay 服務對個資的處理

Session Replay 指的是重播將使用者的行為錄下來重播,市面上有很多這樣的服務,像是 User Replay 或是 SessionCam

這篇文章就是在討論這些服務在處理個資時的方式,像是信用卡卡號的內容,或是密碼的內容,這些不應該被記錄下來的資料是怎麼被處理的:「No boundaries: Exfiltration of personal data by session-replay scripts」,主要的重點在這張圖:

後面有提到目前防禦的情況,看起來目前用 adblock 類的軟體可以擋掉一些服務,但不是全部的都在列表裡。而 DNT 則是裝飾品沒人鳥過:

Two commonly used ad-blocking lists EasyList and EasyPrivacy do not block FullStory, Smartlook, or UserReplay scripts. EasyPrivacy has filter rules that block Yandex, Hotjar, ClickTale and SessionCam.

At least one of the five companies we studied (UserReplay) allows publishers to disable data collection from users who have Do Not Track (DNT) set in their browsers. We scanned the configuration settings of the Alexa top 1 million publishers using UserReplay on their homepages, and found that none of them chose to honor the DNT signal.

Improving user experience is a critical task for publishers. However it shouldn’t come at the expense of user privacy.

Facebook 用哪些資訊來決定投放給你的廣告

華盛頓郵報整理出來了 Facebook 的廣告所使用的 98 個個人資訊:「98 personal data points that Facebook uses to target ads to you」。

基本的個人資訊 (甚至是朋友的),以及使用什麼瀏覽器都可以預期;而 Like 或是參加的 Group 都會被計算也是意料中的事情,不過連信用卡的種類也都在內就頗特別的...

來檢視一下自己的防禦機制有哪些... 瀏覽器預設擋下第三方 cookie:

Ghostery 預設把所有外部元件擋下來,再用白名單開想要看的部份。用 uBlock Origin 擋下所有廣告。

另外用「Force Facebook Most Recent」強制 Facebook 轉到 Most Recent 的 Timeline 上。

不知道這樣夠不夠用...

最後來列出這 98 個條件:

  1. Location
  2. Age
  3. Generation
  4. Gender
  5. Language
  6. Education level
  7. Field of study
  8. School
  9. Ethnic affinity
  10. Income and net worth
  11. Home ownership and type
  12. Home value
  13. Property size
  14. Square footage of home
  15. Year home was built
  16. Household composition
  17. Users who have an anniversary within 30 days
  18. Users who are away from family or hometown
  19. Users who are friends with someone who has an anniversary, is newly married or engaged, recently moved, or has an upcoming birthday
  20. Users in long-distance relationships
  21. Users in new relationships
  22. Users who have new jobs
  23. Users who are newly engaged
  24. Users who are newly married
  25. Users who have recently moved
  26. Users who have birthdays soon
  27. Parents
  28. Expectant parents
  29. Mothers, divided by “type” (soccer, trendy, etc.)
  30. Users who are likely to engage in politics
  31. Conservatives and liberals
  32. Relationship status
  33. Employer
  34. Industry
  35. Job title
  36. Office type
  37. Interests
  38. Users who own motorcycles
  39. Users who plan to buy a car (and what kind/brand of car, and how soon)
  40. Users who bought auto parts or accessories recently
  41. Users who are likely to need auto parts or services
  42. Style and brand of car you drive
  43. Year car was bought
  44. Age of car
  45. How much money user is likely to spend on next car
  46. Where user is likely to buy next car
  47. How many employees your company has
  48. Users who own small businesses
  49. Users who work in management or are executives
  50. Users who have donated to charity (divided by type)
  51. Operating system
  52. Users who play canvas games
  53. Users who own a gaming console
  54. Users who have created a Facebook event
  55. Users who have used Facebook Payments
  56. Users who have spent more than average on Facebook Payments
  57. Users who administer a Facebook page
  58. Users who have recently uploaded photos to Facebook
  59. Internet browser
  60. Email service
  61. Early/late adopters of technology
  62. Expats (divided by what country they are from originally)
  63. Users who belong to a credit union, national bank or regional bank
  64. Users who investor (divided by investment type)
  65. Number of credit lines
  66. Users who are active credit card users
  67. Credit card type
  68. Users who have a debit card
  69. Users who carry a balance on their credit card
  70. Users who listen to the radio
  71. Preference in TV shows
  72. Users who use a mobile device (divided by what brand they use)
  73. Internet connection type
  74. Users who recently acquired a smartphone or tablet
  75. Users who access the Internet through a smartphone or tablet
  76. Users who use coupons
  77. Types of clothing user’s household buys
  78. Time of year user’s household shops most
  79. Users who are “heavy” buyers of beer, wine or spirits
  80. Users who buy groceries (and what kinds)
  81. Users who buy beauty products
  82. Users who buy allergy medications, cough/cold medications, pain relief products, and over-the-counter meds
  83. Users who spend money on household products
  84. Users who spend money on products for kids or pets, and what kinds of pets
  85. Users whose household makes more purchases than is average
  86. Users who tend to shop online (or off)
  87. Types of restaurants user eats at
  88. Kinds of stores user shops at
  89. Users who are “receptive” to offers from companies offering online auto insurance, higher education or mortgages, and prepaid debit cards/satellite TV
  90. Length of time user has lived in house
  91. Users who are likely to move soon
  92. Users who are interested in the Olympics, fall football, cricket or Ramadan
  93. Users who travel frequently, for work or pleasure
  94. Users who commute to work
  95. Types of vacations user tends to go on
  96. Users who recently returned from a trip
  97. Users who recently used a travel app
  98. Users who participate in a timeshare

百度被抓到蒐集個資後還是要蒐集...

在「Thousands of apps running Baidu code collect, leak personal data - research」這篇裡,加拿大的研究團隊 Citizen Lab 發現百度的 Android SDK 使用非加密傳輸這些個資:

The unencrypted information that has been collected includes a user's location, search terms and website visits, JeffreyKnockel, chief researcher at Citizen Lab, told Reuters ahead of publication of the research on Wednesday.

百度說他們會修正加密問題,但還是要蒐集:

[,] and Baidu told Reuters it would be fixing the encryption holes in its kits, but would still collect data for commercial use, some of which it said it shares with third parties.

霸氣!不愧是百度!即使被抓到後還是要蒐集 XDDD

AVG 更新隱私條款,以便能夠蒐集使用者的搜尋紀錄並且賣給其他人

在「AVG can sell your browsing and search history to advertisers」這邊整理的比較清楚:

The updated policy explained that AVG was allowed to collect "non-personal data", which could then be sold to third parties.

或是抓原文:

Do you share my data?

Yes, though when and how we share it depends on whether it is personal data or non-personal data. AVG may share non-personal data with third parties and may publicly display aggregate or anonymous information.

新的條款會在今年 (2015) 的十月十五日生效。