Home » Posts tagged "alphago"

AlphaGo 的開局庫分析

Facebook 上看到 Aja Huang 的訊息,介紹了 DeepMind 放出的新資料,由 AlphaGo 分析人類開局的各種勝率 (不是先前發表出來更凶的 AlphaZero,但不曉得是 AlphaGo Zero 還是 AlphaGo Master...)。

網站在 AlphaGo Teach: Discover new and creative ways of playing Go,盤面上的數字都是指黑棋勝率。

This tool provides analysis of 6,000 of the most popular opening sequences from the recent history of Go, using data from 231,000 human games and 75 games AlphaGo played against human players.

Explore the board and learn how AlphaGo's moves compare to those of professional and amateur players.

由於這是暴力法算出超大量的開局資料,而且是完整公佈在網路上,可以預料接下來的職業比賽,開局應該會參考這邊的資料進行修正...

AlphaGo Zero 演算法的 Open Source 實做

Leela 的作者 Gian-Carlo Pascutto 依照 paper 的描述寫完了,放在 GitHub 上的 gcp/leela-zero

不過他在 Twitter 上也提到了,open source 實做不是真正的困難,真正的困難在於訓練完的資料,那個部份需要大量的成本才有辦法作到:

另外他推估 AlphaGo Zero 的計算量是 1700 年 (以 1080 Ti 來計算):「[Computer-go] Zero performance」。

另外 Leela 0.11.0 也推出了,還是先維持 policy + value 的方式,但引入了不少新演算法加強。另外一個蠻特別的地方是 Windows 版改用 clang 而變快不少:

Windows version is now compiled with Clang/LLVM 5.0 instead of MSVC2017. This makes the Monte Carlo evaluations about 15% faster.

雖然 DeepMind 說要收手,但還是留下不少方向讓大家走...

AlphaGo Zero 的計算量

AlphaGo Zero 論文裡有提到,用同樣的硬體 (4 TPU) 可以用 89:11 碾壓 AlphaGo Master (今年年初與柯潔下的那個版本),主要是得力於更高品質的 neural network 以及更強的選擇能力 (後面這塊應該是將兩個 nerual network 簡化為一後的好處):

This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration.

那麼對應的問題就會冒出來了,究竟 DeepMind 花了多少時間才能訓練出這個新的 nerual network?結果吳毅成教授在 Facebook 上先估算出來了:

這邊的 TPU 對 GPU 的推估應該是基於當時 Google 在說明 TPU 的部份「An in-depth look at Google’s first Tensor Processing Unit (TPU)」:

In short, we found that the TPU delivered 15–30X higher performance and 30–80X higher performance-per-watt than contemporary CPUs and GPUs.

用 GPU 大約是 12K 顆,反推回 TPU 大約也是千顆這個數量左右。而這個數量以目前已經將 TPU 商用化的 Google 來看應該是很輕鬆,只能說有錢真好 XD:

1. 從另外一個角度看, DeepMind 僅40天就可以把 40-block 版本練起來, 換算一下, DeepMind 等於用了約12000顆 1080 Ti.

AlphaGo 與柯潔下的三盤棋,包括了雙方的講解

Aja Huang (黃士傑) 的 Facebook 上看到 AlphaGo 與柯潔在烏鎮下的三盤棋的講解,這次的講解除了找柯潔與樊麾以外,更重要的是直接拿了與當時相同配備的 AlphaGo 出來使用 (只要柯潔想要看某些變化 AlphaGo 會怎麼應對,樊麾都會輸入進去跑模擬):

需要說明的是,視頻中參與覆盤的AlphaGo與烏鎮峰會比賽的版本和硬體配置(搭配4個TPU的單機版)完全一樣。希望大家享受這三盤精彩的對局,也能從這三個視頻的覆盤研究中有所收穫。祝大家觀影愉快。

關於圍棋貼目的問題...

前陣子 AlphaGo 大獲全勝後放出了五十盤自戰棋譜 (兩台 AlphaGo 自己下),其實有件事情有點出乎大家意料,而在圍棋界被一直討論。就是在這五十盤裡,黑棋與白棋的勝率比是 12:38 (中國規則,黑棋貼 7.5 目的情況),明顯白棋有強大的優勢。

這個 7.5 目指的是,由於黑棋先下 (先手優勢),所以圍的地會比較多,為了彌補白棋後下的這個缺點,一般都會設計「貼目」這個規則。

交大資工的 CGI 團隊在上個月月底發了一篇論文 (參考「CGOS Whole Period Ratings for 19x19 Board」這邊的記錄,在有參加 CGOS 的團隊裡只輸新版的 Zen),討論 value network 的新想法:「Multi-Labelled Value Networks for Computer Go」。

他們對貼目的數量做了分析:

For the training data, we label on output 𝑣𝑘 as follows. For each self-play game, first calculate territory difference 𝑛 at the end of the game. Then, based on the Chinese rule, label 1 (win) on 𝑣𝑘 for all 𝑘 < 𝑛, and -1 (lose) for all 𝑘 > 𝑛. (Note that the draw case 𝑘 = 𝑛 is ignored in this paper since the komi is not an integer normally.) For example, if black occupies 7 more points of territory than white, the 𝑘-komi game is considered a win for all 𝑘 < 7, and a loss for all 𝑘 > 7. Thus, in this case, a 7.5-komi game is a loss, and a 6.5-komi or 0.5-komi game is a win.

這個研究完全顛覆了目前職業棋手一般的理解。目前的理解是,貼 5.5 目是黑棋優勢,貼 7.5 目是白棋優勢 (所謂的大貼目時代)。

接下來應該會有更多的研究出來,圍棋界會不會反思貼目規則呢...

AlphaGo 又要重出江湖了

DeepMind 家的 AlphaGo 又要重出江湖了:「Exploring the mysteries of Go with AlphaGo and China's top players」。這次將會跟中國的頂尖職業棋手對弈,裡面提到三種賽制:

The summit will feature a variety of game formats involving AlphaGo and top Chinese players, specifically designed to explore the mysteries of the game together. The games will include:

  • “Pair Go” — A game where one Chinese pro will play against another...except they will both have their own AlphaGo teammate, alternating moves, to take the concept of ‘learning together’ quite literally.
  • “Team Go” — A game between AlphaGo and a five-player team consisting of China’s top pro players, working together to test AlphaGo’s creativity and adaptability to their combined style.
  • “Ke Jie vs AlphaGo” — Of course, the centerpiece of the event will be a classic 1:1 match of three games between AlphaGo and the world’s number one player, Ke Jie, to push AlphaGo to (...perhaps beyond!) its limits.

有傳言是這樣打 (目前大家都抄來抄去...),但目前還沒看到正式的公告:

(一)5月23,25,27日 柯潔與AlphaGo三番棋:每方3小時,5次1分鐘讀秒;
(二)5月26日 時越、羋昱廷、唐韋星、陳耀燁和周睿羊5人團隊賽:每方2小時30分,保留3次1分鐘讀秒;
(三)5月26日 古力、連笑人機配對賽:每方1小時,1次1分鐘讀秒。

另外有一篇「Innovations of AlphaGo」講了不少年初 AlphaGo 連勝六十盤的對局給人類棋手的啟發。

另外 DeepMind 的 CEO,Demis Hassabis 前幾天剛好有給了一個演講,也是講了不少東西 (不過都是之前就提過的):

這樣來安排那幾天空時間下來...

AlphaGo 不是使用 GPU 加速...

Google 今天公佈的資料中說明了 AlphaGo 不是用一般常見的 GPU 加速運算:「Google supercharges machine learning tasks with TPU custom chip」。

這是特別為 TensorFlow 製作的 ASIC:

The result is called a Tensor Processing Unit (TPU), a custom ASIC we built specifically for machine learning — and tailored for TensorFlow.

而 AlphaGo 用的版本是 TPU 版:

AlphaGo was powered by TPUs in the matches against Go world champion, Lee Sedol, enabling it to "think" much faster and look farther ahead between moves.

放 AlphaGo 的機櫃長這樣:

通常 ASIC 特製的版本會比 FPGA 或是 GPU 快上許多,這代表目前這些沒有大公司撐腰的圍棋軟體要跟 AlphaGo 拼,除非演算法上有重大的突破,不然就得用更大量的設備跟他換...

谷李五番棋今天開打

GoogleDeepMind 所研發出來的 AlphaGo李世乭的「谷李五番棋」將在今天開打。

中國規則、兩個小時、一分鐘讀秒:

The matches will be played under Chinese rules with a komi of 7.5 (the compensation points the player who goes second receives at the end of the match). Each player will receive two hours per match with three lots of 60-second byoyomi (countdown periods after they have finished their allotted time).

將在韓國時間下午一點開賽,對我們也就是十二點開賽:

The matches will be held at the Four Seasons Hotel, Seoul, South Korea, starting at 1pm local time (4am GMT; day before 11pm ET, 8pm PT) on March 9th, 10th, 12th, 13th and 15th.

將會有大量的媒體講解直播,毫無疑問的,YouTubeDeepMind 這個頻道會有直播,目前看起來是早上的十一點半就會開始了。

其他的頻道,台灣已知的有:

不過我應該會到處看吧,中國的圍棋網站「围棋TV」也會在十一點半開始直播。

很多人都有猜測勝負,但自從去年十月贏了樊麾後,不知道成長了多少。其實都是在資訊不足的情況下猜測,在這種情況下,人類大獲全勝或是電腦大獲全勝都不意外...

也因此,五番棋的第一盤應該是最轟動的,因為可以看出 AlphaGo 長到什麼程度... 不過聽說第一盤 Google 只會拿單機版出來應戰?所以如果輸的很慘的話就會拿雲端版來戰?

啊啊啊我好想看啊...

最近電腦圍棋的兩個突破...

昨天先看到 Mark Zuckerberg 丟出來的資訊,比較完整的資料可以在「Better Computer Go Player with Neural Network and Long-term Prediction」這邊看到。

Facebook 這邊的成就在於用 DCNN-based model 改善效率,可以用一台電腦 (包括 GPU) 維持在 KGS 5d (業餘五段),跟 Zen19 差不多的棋力,跟目前最好的電腦圍棋軟體棋力差不多:

darkfmcts3now holds a stable KGS 5d level, on par with the top Go AIs, has beaten Zen19 onceand hold 1win/1lose against a Korean 6p professional player with 4 handicaps.

The distributed version, named darkfmcts3in KGS Go Server, use darkfores2 as the underlying DCNN model, runs 75,000 rollouts on 2048 threads and produces a move every 13 seconds with one Intel Xeon E5-2680 v2 at 2.80GHz and 44 NVidia K40m GPUs.

In this paper, we have substantially improved the performance of DCNN-based Go AI, extensively evaluated it against both open source engines and strong amateur human players, and shown its potentials if combined with Monte-Carlo Tree Search (MCTS).

而隔壁棚的 Google 也丟出對應的研究成果「AlphaGo: Mastering the ancient game of Go with Machine Learning」,直接往 Nature 上丟:「Mastering the game of Go with deep neural networks and tree search」。

Google 不是跑在單機上,而是跑在 Google Cloud Platform 上,分散式的版本大約是職業五段的水準:

Of course, all of this requires a huge amount of compute power, so we made extensive use of Google Cloud Platform, which enables researchers working on AI and Machine Learning to access elastic compute, storage and networking capacity on demand.

實際對樊麾 (職業二段) 的成績也是五比零的大獲全勝:

So we invited the reigning 3-time European Go champion Fan Hui — an elite professional player who has devoted his life to Go since the age of 12 — to our London office for a challenge match. The match was played behind closed doors between October 5-9 last year. AlphaGo won by 5 games to 0 -- the first time a computer program has ever beaten a professional Go player.

而今年預定要去打大魔王,目前韓國圍棋第一人李世乭

AlphaGo’s next challenge will be to play the top Go player in the world over the last decade, Lee Sedol. The match will take place this March in Seoul, South Korea. Lee Sedol is excited to take on the challenge saying, "I am privileged to be the one to play, but I am confident that I can win." It should prove to be a fascinating contest!

到時候中國那邊的圍棋節目應該會有網路直播可以看吧,再來盯...

Archives