DeepMind 的 Player of Games

前幾天在 Hacker News Daily 上看到的消息,DeepMind 發了一篇新的論文,講 Player of Games 這個新的演算法:「Player of Games」,Hacker News 上的討論在這:「Player of Games (arxiv.org)」。

照留言上的討論,Player of Games 的名字由來應該是取自科幻小說《The Player of Games》。

這是一個更一般性的演算法,可以同時駕馭 perfect information 與 imperfect information:

We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments.

論文裡面也提到以前的各種演算法 (包含 DeepMind 自家的一些演算法)。在 perfect information 的例子來說,可以看到沒有 AlphaZero 強 (西洋棋與圍棋),但也已經有一定水準了,算是個起頭的感覺:

主要的成就在於一般性,但論文後面也有提到,目前這個演算法需要的資源還是過大,還有改善的空間...

Leave a Reply

Your email address will not be published. Required fields are marked *