AWS 提供 Windows 上的 Deep Learning AMI

有一些 Windows 上的東西就可以直接開起來跑了:「Announcing New AWS Deep Learning AMI for Microsoft Windows」。

目前支援 2012 R2 與 2016:

Amazon Web Services now offers an AWS Deep Learning AMI for Microsoft Windows Server 2012 R2 and 2016.

然後 driver 與常用的東西都包進去了:

The AMIs also include popular deep learning frameworks such as Apache MXNet, Caffe and Tensorflow, as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. The AMIs come prepackaged with Nvidia CUDA 9, cuDNN 7, and Nvidia 385.54 drivers, and contain the Anaconda platform (supports Python versions 2.7 and 3.5).

機器學習與情色產業的問題

Bruce Schneier 提到了最近幾個剛好相關的議題,關於機器學習在情色產業使用時遇到的隱私議題:「Technology to Out Sex Workers」。

第一個提到的是 PornHub 用機器學習辨識演員以及各種「其他資訊」,這邊引用的報導是 TechCrunch 的「PornHub uses computer vision to ID actors, acts in its videos」:

PornHub is using machine learning algorithms to identify actors in different videos, so as to better index them.

The computer vision system can identify specific actors in scenes and even identifies various positions and… attributes.

第二個提到的是花名與真實身份連在一起的問題:

People are worried that it can really identify them, by linking their stage names to their real names.

最後是提到 Facebook 已經有能力這樣做,而且已經發生了:

Facebook somehow managed to link a sex worker's clients under her fake name to her real profile.

Her sex-work identity is not on the social network at all; for it, she uses a different email address, a different phone number, and a different name. Yet earlier this year, looking at Facebook’s “People You May Know” recommendations, Leila (a name I’m using using in place of either of the names she uses) was shocked to see some of her regular sex-work clients.

這個議題與 Mass surveillance 有點像...。

星海爭霸 II 官方的 AI Workshop

Blizzard 公佈了在十一月的月初將會舉辦星海二的 AI Workshop:「Announcing the StarCraft II AI Workshop」。

On November 3 and 4, Blizzard and DeepMind will co-host the StarCraft II AI Workshop at the Hilton Anaheim hotel, next to the Anaheim Convention Center.

官方 (包括 DeepMind 團隊) 也會針對 SC2LE (Starcraft II Learning Environment) 與 SC2API (StarCraft II API) 提供交流:

Engineers and researchers from Blizzard and DeepMind will also be on-hand to meet with attendees and answers questions about the SC2LE and SC2API.

然後時間會跟 BlizzCon 2017 重疊 (目前看起來是卡到最後兩天),票是不能通用的:

While this event takes place during BlizzCon 2017, it is considered a separate event and is not part of the official BlizzCon program – therefore BlizzCon badges will not grant access to the AI workshop. However, we will be providing a limited pool of shareable BlizzCon badges that attendees of the AI workshop can use to check out BlizzCon and catch the StarCraft II Global Finals for inspiration on how to build superior AIs!

接下來應該會有不少消息出來... DeepMind 團隊的開發進度有可以跟頂尖選手競賽嗎?

對 Open Data 的攻擊手段

前陣子看到的「Membership Inference Attacks against Machine Learning Models」,裡面試著做到的攻擊手法:

[G]iven a data record and black-box access to a model, determine if the record was in the model's training dataset.

也就是拿到一組 Open Data 的存取權限,然後發展一套方法判斷某筆資料是否在裡面。而驗證攻擊的手法當然就是直接攻擊看效果:

We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

透過 NN 攻擊 NN,而目前的解法也不太好處理,但有做總是會讓精確度降低。論文裡提到了四種讓難度增加的方法:

  • Restrict the prediction vector to top k classes.
  • Coarsen precision of the prediction vector.
  • Increase entropy of the prediction vector.
  • Use regularization.

另外一個值得看的資料是 2006 年發生的「AOL search data leak」,當年資料被放出來後有真實的使用者被找出來,也是很轟動啊...

用 Machine Learning 調校資料庫

AWS AI Blog 在月初上放出來的消息:「Tuning Your DBMS Automatically with Machine Learning」。

Carnegie Mellon Database Group 做的研究,除了預設值以外,另外跟四種不同的參數做比較,分別是 OtterTune (也就是這次的研究)、Tuning script (對於不熟資料庫的人,常用的 open source 工具)、DBA 手動調整,以及 RDS

MySQL

PostgreSQL

比較明顯的結論是:

  • Default 值在所有的 case 下都是最差的 (無論是 MySQL 與 PostgreSQL 平台,以及包括 99% 的 Latency 與 QPS,這樣二乘二的四個結果)。而且 Default 跑出來的數字與其他的差距都很明顯。
  • OtterTune 在所有 case 下跑出來都比 Tuning script 的好。這也是合理的結果,本來就是想要取代其他機器跑出來的結果。

至於有些討論 DBA 會失業的事情,我是樂見其成啦... 這些繁瑣的事情可以自動化就想交給自動化吧 XD

Google Cloud Platform 的機器與服務也支援 GPU 了...

Google 宣佈在 GCP 上的機器與服務支援 GPU 運算了:「GPUs are now available for Google Compute Engine and Cloud Machine Learning」。

算是 beta 階段,用的是 NVIDIA Tesla K80

Google Cloud Platform gets a performance boost today with the much anticipated public beta of NVIDIA Tesla K80 GPUs.

然後台灣的 asia-east1 也在內。第一波必須透過 cli 操作,之後才會在 web console 上加上去:

You can now spin up NVIDIA GPU-based VMs in three GCP regions: us-east1, asia-east1 and europe-west1, using the gcloud command-line tool. Support for creating GPU VMs using the Cloud Console appears next week.

也開始支援 GPU 了...

把才能用在奇怪的地方:老闆偵測器

作者用 OpenCV 學習老闆的臉,然後當老闆走過來的時候把畫面切到努力工作中的 screenshot XDDD:「Deep Learning Enables You to Hide Screen when Your Boss is Approaching」。

“My boss left his seat and he was approaching to my seat.”

“OpenCV has detected the face and input the image into the learned model.”

“The screen has switched by recognizing him! ヽ(‘ ∇‘ )ノ ワーイ”

作者是個日本人 (要說不意外嗎 XDDD),這套軟體的程式碼在「Hironsan/BossSensor」這邊 XDDD

超級浪費才能 XDDD

Amazon Rekognition:圖片辨識 API

GoogleVision API,到 MicrosoftComputer Vision API (參考「微軟也推出圖片辨識的 API 了」),AWS 也推出類似的服務了:「Amazon Rekognition – Image Detection and Recognition Powered by Deep Learning」。

與其他兩家都是類似的方式,丟圖進去然後用系統已經 train 好的資料給你分析結果... 然後依照次數算錢。

有種算是補產品線的感覺啦...

Yahoo! 也放出了判斷是否為色情圖片的方案

感覺好像是從 AlphaGo 大勝李世乭開始,透過各類 neural network 的技術就一直冒出來...

Yahoo! 這次放出來判斷是否為色情圖片的也是同源的技術:「Open Sourcing a Deep Learning Solution for Detecting NSFW Images」。

當年沒辦法做的事情,現在的技術已經成熟到被 open source 出來了...

Google 整理並公開出九百萬張圖片以及對應的 tag

Google 放出了九百萬張以 CC 授權釋出的圖片,標上 tag 後變成 Open Images dataset:「Introducing the Open Images Dataset」,像是這樣:

Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 license

不過這不是人類分類出來的結果,而是機械學習的成果:

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API.

不過因為這不是人工確認過的資料,如果要拿來做比較精確的研究,還是得用 Amazon Mechanical Turk 這類服務先校正過以確保正確性。