Google 整理並公開出九百萬張圖片以及對應的 tag

Google 放出了九百萬張以 CC 授權釋出的圖片,標上 tag 後變成 Open Images dataset:「Introducing the Open Images Dataset」,像是這樣:


Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 license

不過這不是人類分類出來的結果,而是機械學習的成果:

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API.

不過因為這不是人工確認過的資料,如果要拿來做比較精確的研究,還是得用 Amazon Mechanical Turk 這類服務先校正過以確保正確性。

透過演算法修照片的工具

看到「A simple interface for editing natural photos with generative neural networks.」這個工具,直接看這個兩分鐘的影片比較清楚知道他想要做的事情:

當你修正了某一個點後,這個工具就透過演算法改變其他的點,讓照片看起來變得自然... 作者所發表的論文可以在「Neural Photo Editing with Introspective Adversarial Networks」這邊下載到。

透過 Deep Learning 辨識人臉馬賽克的技術

在某些新聞報導透漏出了受害者的某些背景身份,於是你手上有了這兩個資料:

  • 符合這些背景身份的四十個人的照片。
  • 人臉被馬賽克後的新聞照片。

現在的問題是,要怎麼判斷出新聞照片裡是哪個人:「Defeating Image Obfuscation with Deep Learning」。

類似這樣的實驗,從 40 個人中找出正確的人,有 50% 的正確率:

也許 50% 不算到能用的程度,但這代表老大哥的技術已經在發展了...

機器學習減肥法

Hacker News Daily 上看到的方法,作者利用機器學習的方法試著找出那些因素導致他變胖,然後再規劃減肥計畫:「Discovering ketosis: how to effectively lose weight」,文章有點長,講重點。

首先作者把每天的體重與行為記錄起來,像是這樣:

#
# -- Comment lines (ignored)
#
Date,MorningWeight,YesterdayFactors
2012-06-10,185.0,
2012-06-11,182.6,salad sleep bacon cheese tea halfnhalf icecream
2012-06-12,181.0,sleep egg
2012-06-13,183.6,mottsfruitsnack:2 pizza:0.5 bread:0.5 date:3 dietsnapple splenda milk nosleep
2012-06-14,183.6,coffeecandy:2 egg mayo cheese:2 rice meat bread:0.5 peanut:0.4
2012-06-15,183.4,meat sugarlesscandy salad cherry:4 bread:0 dietsnapple:0.5 egg mayo oliveoil
2012-06-16,183.6,caprise bread grape:0.2 pasadena sugaryogurt dietsnapple:0.5 peanut:0.4 hotdog
2012-06-17,182.6,grape meat pistachio:5 peanut:5 cheese sorbet:5 orangejuice:2
# and so on ...

當時只是記錄,並沒有刻意減肥:

I was not dieting at that time. Just collecting data.

剩下的就跑分析直接拉出哪些行為的幫助最大,於是就有這張圖了:

Humble Bundle 對抗信用卡盜刷的方法

Humble Bundle 說明他們如何對抗信用卡盜刷的方法,主要是不斷的降低風險,然後讓人介入的機會降低 (因為人事成本很高):「How Humble Bundle stops online fraud」。

其中第一點是特別想提的:

Our first line of defense is a machine-learning-based anti-abuse startup called Sift Science, which we’ve been training for years across 55,000,000 transactions. Given how many orders we process, Sift Science has a really good idea when someone is up to no good. The model adapts daily as we get more data.

Sift Science 在 2014 的時候提過:「偵測信用卡交易是否為盜刷的服務」。做的事情很簡單,你把大量的資料傳給 Sift Science,包括了各種使用者身份資訊,以及信用卡資料,Sift Science 可以透過 Machine Learning 的方法告訴你這筆交易的風險,讓你進一步的判斷。

其實不少家都有做類似的服務,像是 MaxMindminFraud (就是做 GeoIP database 很有名的那家公司的另外一個產品)。當交易量很大的時候是個很有趣的應用,降低處理盜刷後續處理的成本。

Facebook 大量蒐集 GPS 定位資訊後用機械學習「猜測」你可能認識的人

Bruce Schneier 這邊看到「Facebook Using Physical Location to Suggest Friends」這則文章,引用自「Facebook is using your phone’s location to suggest new friends—which could be a privacy disaster」這篇報導,報導開頭寫著更新的資訊:

Update (June 28): After twice confirming it used location to suggest new friends, Facebook now says it doesn’t currently use “location data, such as device location and location information you add to your profile, to suggest people you may know.” The company says it ran a brief test using location last year. New story here.

Facebook 第二次確認後發現是標準的「啊!靠腰!是 PR 災難」的處理方式。在第一次跟 Facebook 確認時,Facebook 發言人的正式回覆說明了手機的位置是計算的條件之一:

“People You May Know are people on Facebook that you might know,” a Facebook spokesperson said. “We show you people based on mutual friends, work and education information, networks you’re part of, contacts you’ve imported and many other factors.”

One of those factors is smartphone location. A Facebook spokesperson said though that shared location alone would not result in a friend suggestion, saying that the two parents must have had something else in common, such as overlapping networks.

“Location information by itself doesn’t indicate that two people might be friends,” said the Facebook spokesperson. “That’s why location is only one of the factors we use to suggest people you may know.”

靠背...

Google Compute Engine 推出 Custom Machine Type

Google Compute Engine 推出了可以自己設定 CPU 與 RAM 的機器種類:「Custom Machine Types - Compute Engine — Google Cloud Platform」。

可以從 1 個 vCPU 到 32 個 vCPU,而記憶體最多是 6.5GB * vCPU 數,所以理論上最高是 208GB?

Create a machine type with as little as 1 vCPU and up to 32 vCPUs, or any even number of vCPUs in between. Memory can be configured up to 6.5 GB of RAM per vCPU.

計價方式就是 vCPU 算一份,記憶體算一份。記得以前有比較小的 Cloud Service 有提供過類似的計價方式,後來都收掉了...

國際太空站要裝咖啡機了...

在「The International Space Station (Finally) Gets an Espresso Machine」這篇看到的,原始的報導出自「The International Space Station (finally!) gets an espresso machine」。

幾個重點 XDDD

The ISSpresso requires 120V DC power which is obtained at the Utility Outlet Panel (UOP) on the ISS.

很特別的電力,是 120V DC 而非 120V AC...

HHVM 2.3.0 支援 FastCGI...

HHVM 官方的 blog 上看到 2.3.0 的消息:「HHVM 2.3.0 and Travis CI」。

GitHub 上的「FastCGI」這頁就有提到要怎麼透過 FastCGI 界面跟 Apache 配合,熟悉 nginx 的人也應該可以輕鬆對應過去。

另外一個重要的事情是 Travis CI 支援 HHVM 了,可以看到大量的專案加上 HHVM 測試:YiiSlimphpBBJoomlaDoctrineCodeIgniterIdiormPHPUnitParis

既然支援 FastCGI 了,來找機會測試看看...