## AlphaGo Zero 演算法的 Open Source 實做

Leela 的作者 Gian-Carlo Pascutto 依照 paper 的描述寫完了，放在 GitHub 上的 gcp/leela-zero

Windows version is now compiled with Clang/LLVM 5.0 instead of MSVC2017. This makes the Monte Carlo evaluations about 15% faster.

## AlphaGo Zero 的計算量

AlphaGo Zero 論文裡有提到，用同樣的硬體 (4 TPU) 可以用 89:11 碾壓 AlphaGo Master (今年年初與柯潔下的那個版本)，主要是得力於更高品質的 neural network 以及更強的選擇能力 (後面這塊應該是將兩個 nerual network 簡化為一後的好處)：

This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration.

In short, we found that the TPU delivered 15–30X higher performance and 30–80X higher performance-per-watt than contemporary CPUs and GPUs.

1. 從另外一個角度看, DeepMind 僅40天就可以把 40-block 版本練起來, 換算一下, DeepMind 等於用了約12000顆 1080 Ti.

## AlphaGo 與柯潔下的三盤棋，包括了雙方的講解

Aja Huang (黃士傑) 的 Facebook 上看到 AlphaGo 與柯潔在烏鎮下的三盤棋的講解，這次的講解除了找柯潔與樊麾以外，更重要的是直接拿了與當時相同配備的 AlphaGo 出來使用 (只要柯潔想要看某些變化 AlphaGo 會怎麼應對，樊麾都會輸入進去跑模擬)：

## 關於圍棋貼目的問題...

For the training data, we label on output ?? as follows. For each self-play game, first calculate territory difference ? at the end of the game. Then, based on the Chinese rule, label 1 (win) on ?? for all ? < ?, and -1 (lose) for all ? > ?. (Note that the draw case ? = ? is ignored in this paper since the komi is not an integer normally.) For example, if black occupies 7 more points of territory than white, the ?-komi game is considered a win for all ? < 7, and a loss for all ? > 7. Thus, in this case, a 7.5-komi game is a loss, and a 6.5-komi or 0.5-komi game is a win.