Home » Posts tagged "machine"

AWS 提供 Windows 上的 Deep Learning AMI

有一些 Windows 上的東西就可以直接開起來跑了:「Announcing New AWS Deep Learning AMI for Microsoft Windows」。

目前支援 2012 R2 與 2016:

Amazon Web Services now offers an AWS Deep Learning AMI for Microsoft Windows Server 2012 R2 and 2016.

然後 driver 與常用的東西都包進去了:

The AMIs also include popular deep learning frameworks such as Apache MXNet, Caffe and Tensorflow, as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. The AMIs come prepackaged with Nvidia CUDA 9, cuDNN 7, and Nvidia 385.54 drivers, and contain the Anaconda platform (supports Python versions 2.7 and 3.5).

機器學習與情色產業的問題

Bruce Schneier 提到了最近幾個剛好相關的議題,關於機器學習在情色產業使用時遇到的隱私議題:「Technology to Out Sex Workers」。

第一個提到的是 PornHub 用機器學習辨識演員以及各種「其他資訊」,這邊引用的報導是 TechCrunch 的「PornHub uses computer vision to ID actors, acts in its videos」:

PornHub is using machine learning algorithms to identify actors in different videos, so as to better index them.

The computer vision system can identify specific actors in scenes and even identifies various positions and… attributes.

第二個提到的是花名與真實身份連在一起的問題:

People are worried that it can really identify them, by linking their stage names to their real names.

最後是提到 Facebook 已經有能力這樣做,而且已經發生了:

Facebook somehow managed to link a sex worker's clients under her fake name to her real profile.

Her sex-work identity is not on the social network at all; for it, she uses a different email address, a different phone number, and a different name. Yet earlier this year, looking at Facebook’s “People You May Know” recommendations, Leila (a name I’m using using in place of either of the names she uses) was shocked to see some of her regular sex-work clients.

這個議題與 Mass surveillance 有點像...。

星海爭霸 II 官方的 AI Workshop

Blizzard 公佈了在十一月的月初將會舉辦星海二的 AI Workshop:「Announcing the StarCraft II AI Workshop」。

On November 3 and 4, Blizzard and DeepMind will co-host the StarCraft II AI Workshop at the Hilton Anaheim hotel, next to the Anaheim Convention Center.

官方 (包括 DeepMind 團隊) 也會針對 SC2LE (Starcraft II Learning Environment) 與 SC2API (StarCraft II API) 提供交流:

Engineers and researchers from Blizzard and DeepMind will also be on-hand to meet with attendees and answers questions about the SC2LE and SC2API.

然後時間會跟 BlizzCon 2017 重疊 (目前看起來是卡到最後兩天),票是不能通用的:

While this event takes place during BlizzCon 2017, it is considered a separate event and is not part of the official BlizzCon program – therefore BlizzCon badges will not grant access to the AI workshop. However, we will be providing a limited pool of shareable BlizzCon badges that attendees of the AI workshop can use to check out BlizzCon and catch the StarCraft II Global Finals for inspiration on how to build superior AIs!

接下來應該會有不少消息出來... DeepMind 團隊的開發進度有可以跟頂尖選手競賽嗎?

對 Open Data 的攻擊手段

前陣子看到的「Membership Inference Attacks against Machine Learning Models」,裡面試著做到的攻擊手法:

[G]iven a data record and black-box access to a model, determine if the record was in the model's training dataset.

也就是拿到一組 Open Data 的存取權限,然後發展一套方法判斷某筆資料是否在裡面。而驗證攻擊的手法當然就是直接攻擊看效果:

We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

透過 NN 攻擊 NN,而目前的解法也不太好處理,但有做總是會讓精確度降低。論文裡提到了四種讓難度增加的方法:

  • Restrict the prediction vector to top k classes.
  • Coarsen precision of the prediction vector.
  • Increase entropy of the prediction vector.
  • Use regularization.

另外一個值得看的資料是 2006 年發生的「AOL search data leak」,當年資料被放出來後有真實的使用者被找出來,也是很轟動啊...

用 Machine Learning 調校資料庫

AWS AI Blog 在月初上放出來的消息:「Tuning Your DBMS Automatically with Machine Learning」。

Carnegie Mellon Database Group 做的研究,除了預設值以外,另外跟四種不同的參數做比較,分別是 OtterTune (也就是這次的研究)、Tuning script (對於不熟資料庫的人,常用的 open source 工具)、DBA 手動調整,以及 RDS

MySQL

PostgreSQL

比較明顯的結論是:

  • Default 值在所有的 case 下都是最差的 (無論是 MySQL 與 PostgreSQL 平台,以及包括 99% 的 Latency 與 QPS,這樣二乘二的四個結果)。而且 Default 跑出來的數字與其他的差距都很明顯。
  • OtterTune 在所有 case 下跑出來都比 Tuning script 的好。這也是合理的結果,本來就是想要取代其他機器跑出來的結果。

至於有些討論 DBA 會失業的事情,我是樂見其成啦... 這些繁瑣的事情可以自動化就想交給自動化吧 XD

Google Cloud Platform 的機器與服務也支援 GPU 了...

Google 宣佈在 GCP 上的機器與服務支援 GPU 運算了:「GPUs are now available for Google Compute Engine and Cloud Machine Learning」。

算是 beta 階段,用的是 NVIDIA Tesla K80

Google Cloud Platform gets a performance boost today with the much anticipated public beta of NVIDIA Tesla K80 GPUs.

然後台灣的 asia-east1 也在內。第一波必須透過 cli 操作,之後才會在 web console 上加上去:

You can now spin up NVIDIA GPU-based VMs in three GCP regions: us-east1, asia-east1 and europe-west1, using the gcloud command-line tool. Support for creating GPU VMs using the Cloud Console appears next week.

也開始支援 GPU 了...

Amazon Polly 與 Amazon Lex:人機介面中的語音處理

AWS 這次推出的這兩個服務剛好成對:「Amazon Polly – Text to Speech in 47 Voices and 24 Languages」、「Amazon Lex – Build Conversational Voice & Text Interfaces」。

Amazon Polly 負責把文字唸出來變成語音,而 Amazon Lex 則是將語音辨識回文字,不過目前都還不支援中文... 但畢竟讓 user interface 這塊變得更親民了,算是基礎建設中服務,讓 startup 專心在產品本身上。

Google 整理並公開出九百萬張圖片以及對應的 tag

Google 放出了九百萬張以 CC 授權釋出的圖片,標上 tag 後變成 Open Images dataset:「Introducing the Open Images Dataset」,像是這樣:


Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 license

不過這不是人類分類出來的結果,而是機械學習的成果:

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API.

不過因為這不是人工確認過的資料,如果要拿來做比較精確的研究,還是得用 Amazon Mechanical Turk 這類服務先校正過以確保正確性。

Archives