Home » 2017 » October (Page 3)

AlphaGo Zero 演算法的 Open Source 實做

Leela 的作者 Gian-Carlo Pascutto 依照 paper 的描述寫完了,放在 GitHub 上的 gcp/leela-zero

不過他在 Twitter 上也提到了,open source 實做不是真正的困難,真正的困難在於訓練完的資料,那個部份需要大量的成本才有辦法作到:

另外他推估 AlphaGo Zero 的計算量是 1700 年 (以 1080 Ti 來計算):「[Computer-go] Zero performance」。

另外 Leela 0.11.0 也推出了,還是先維持 policy + value 的方式,但引入了不少新演算法加強。另外一個蠻特別的地方是 Windows 版改用 clang 而變快不少:

Windows version is now compiled with Clang/LLVM 5.0 instead of MSVC2017. This makes the Monte Carlo evaluations about 15% faster.

雖然 DeepMind 說要收手,但還是留下不少方向讓大家走...

Google Cloud Platform 的 DLP API

在「New ways to manage sensitive data with the Data Loss Prevention API」這邊提到三月的時候就推出了 DLP API (在「Discover and redact sensitive data with the Data Loss Prevention API」這邊提到的),不過沒什麼印象:

The Data Loss Prevention (DLP) API, which went beta in March, can help you quickly find and protect over 50 types of sensitive data such as credit card numbers, names and national ID numbers.

這次看了一下範例,可以直接對圖片上面分析:

先記起來,看起來之後應該有機會用到?(像是分析使用者上傳的圖片)

Gmail 支援第三方軟體掛入了...

以往都是第三方廠商要透過 browser extension 硬掛進去 (當 Gmail 改版的時候又要修),現在 Gmail 直接提供界面讓他們掛進來了:「Do more from your inbox with Gmail Add-ons」。

包括 Gmail 以及 G Suite 都能用 (應該會需要管理員掛進來?):

Knock out action items the minute they hit your inbox. G Suite and Gmail users can check out the G Suite Marketplace to find and install Gmail Add-ons.

另外也可以自己開發掛入:

If you're a developer, you can also easily create add-ons for your app or your organization—write your add-on code once and it will run natively in Gmail on web and Android right away. Learn more.

Amazon Aurora 也支援 PostgreSQL 了

AWS 宣佈 Amazon Aurora 也支援 PostgreSQL 了,相容於 9.6.3 的版本 (應該就是改自這個版本):「Now Available – Amazon Aurora with PostgreSQL Compatibility」。

效能上一樣有提昇,不過數字參考用:

On the performance side, you can expect up to 3x the throughput that you’d get if you ran PostgreSQL on your own (you can read Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases to learn more about how we did this).

架構上也是採用 6-way replication 的方式:

It is compatible with PostgreSQL 9.6.3 and scales automatically to support up to 64 TB of storage, with 6-way replication behind the scenes to improve performance and availability.

不過區域就比較受限了,亞洲目前還沒開:

You can use Amazon Aurora with PostgreSQL Compatibility today in the US East (Northern Virginia), EU (Ireland), US West (Oregon), and US East (Ohio) Regions, with others to follow as soon as possible.

Node.js 安全性更新:zlib...

前幾天「Node.js 預定在十月 24 號發表安全性更新」提到 Node.js 會發表安全性更新,已經看到 4/6/8 都出新版了:「Node v4.8.5 (Maintenance)」、「Node v6.11.5 (LTS)」、「Node v8.8.0 (Current)」。

這次安全更新的 CVECVE-2017-14919

CVE-2017-14919 - In zlib v1.2.9, a change was made that causes an error to be raised when a raw deflate stream is initialized with windowBits set to 8. On some versions this crashes Node and you cannot recover from it, while on some versions it throws an exception. Node.js will now gracefully set windowBits to 9 replicating the legacy behavior to avoid a DOS vector. nodejs-private/node-private#95

zlib 的 manualwindowBits 說明:

For the current implementation of deflate(), a windowBits value of 8 (a window size of 256 bytes) is not supported. As a result, a request for 8 will result in 9 (a 512-byte window). In that case, providing 8 to inflateInit2() will result in an error when the zlib header with 9 is checked against the initialization of inflate(). The remedy is to not use 8 with deflateInit2() with this initialization, or at least in that case use 9 with inflateInit2().

唔?XD

另外文章可以看出來 Node.js 團隊是選擇在 GitHub 上用另外一個 organization 在管這類不會事前公開的事情...

AlphaGo Zero 的計算量

AlphaGo Zero 論文裡有提到,用同樣的硬體 (4 TPU) 可以用 89:11 碾壓 AlphaGo Master (今年年初與柯潔下的那個版本),主要是得力於更高品質的 neural network 以及更強的選擇能力 (後面這塊應該是將兩個 nerual network 簡化為一後的好處):

This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration.

那麼對應的問題就會冒出來了,究竟 DeepMind 花了多少時間才能訓練出這個新的 nerual network?結果吳毅成教授在 Facebook 上先估算出來了:

這邊的 TPU 對 GPU 的推估應該是基於當時 Google 在說明 TPU 的部份「An in-depth look at Google’s first Tensor Processing Unit (TPU)」:

In short, we found that the TPU delivered 15–30X higher performance and 30–80X higher performance-per-watt than contemporary CPUs and GPUs.

用 GPU 大約是 12K 顆,反推回 TPU 大約也是千顆這個數量左右。而這個數量以目前已經將 TPU 商用化的 Google 來看應該是很輕鬆,只能說有錢真好 XD:

1. 從另外一個角度看, DeepMind 僅40天就可以把 40-block 版本練起來, 換算一下, DeepMind 等於用了約12000顆 1080 Ti.

用 4.5+ 的 Linux Kernel 限制 I/O 速度

在「Using cgroups to limit I/O」這邊看到作者試著用 cgroups 限制 I/O 速度。

作者前面花了不少篇幅解釋 cgroups v1 無法正確限制 I/O 速度,後面就在講 cgroups v2 怎麼做:

So, in order to limit I/O when this I/O may hit the writeback kernel cache, we need to use both memory and io controllers in the cgroups v2!

這會需要 4.5+ 的 kernel,可能會需要手動更新,或是直接使用比較新的 distribution:

Since kernel 4.5, the cgroups v2 implementation was marked non-experimental.

然後照抄就可以了 (不過這邊的指定都需要 root,作者用 $ 表示 shell 有點怪):

# mount -t cgroup2 nodev /cgroup2
# mkdir /cgroup2/cg2
# echo "+io" > /cgroup2/cgroup.subtree_control
# echo "8:0 wbps=1048576" > io.max
# echo $$ > /cgroup2/cg2/cgroup.procs

然後就可以跑 dd 測試速度了,同時間也可以跑 iostat 看。

美國的電信商提供 API,讓第三方透過 IP 就可以知道你的真實身份

前陣子的報料,美國的電信商提供 API 給第三方,讓第三方可以用 IP address 查出你的真實身份:「Want to see something crazy? Open this link on your phone with WiFi turned off.」,像是這樣:

These services are using your mobile phone’s IP address to look up your phone number, your billing information and possibly your phone’s current location as provided by cell phone towers (no GPS or phone location services required).

目前所有的網站都已經被下架了,但可以從當時的截圖看到有多少資訊。AT&T 的新聞稿在「AT&T Helps Businesses Improve Mobile Transaction Security with New Mobile Identity API Toolkit」,新聞稿沒被下掉我猜可能是因為上市公司受法令限制的關係?

這其實是一個警示,說明了美國的電信商開始把大家一直認為極為隱私的資料賣給第三方:

But what these services show us is even more alarming: US telcos appear to be selling direct, non-anonymized, real-time access to consumer telephone data to third party services — not just federal law enforcement officials — who are then selling access to that data.

而且作者在 GitHub 上看到有程式碼針對韓國電信商提供的 API 呼叫,所以韓國也有類似服務:

I found what looks like a third-party API implementation for a Korean Danal API on GitHub. The author wrote the code for South Korean telcos, so there may be differences with US carriers. The query parameters in the HTTP requests are similar to what I remember seeing in the Danal demo. It’s unclear from my reading of the code whether or not this API requires operation inside of e.g. a Danal Inc. hosted-iframe for identity confirmation. The diagram on page 4 of this documentation describing the Korean “Danal Pay” service appears to show the client interacting with the customer’s servers only.

台灣呢,嘿嘿...

Archives