GoFormer - Language Model That Plays Go

Before AlphaGo[1], Go was considered a game that was too complex for AI to master.
In 2017, AlphaGo[1] and AlphaZero[2] defeated a Go Champion, with policy network, value network, and Monte Carlo Tree Search (MCTS)[3][4] that looks ahead.
MCTS is a decisive factor contributing to the world champion level performance.
With the recent advancement of large language model in transformer[5] based decoder with a next token prediction objective[6], and it's application in Chess[7][8], how does a language model (the GoFormer here) perform in a Go game?
[9] finetunes 124M, 355M, and 744M GPT-2[10] on 56,638 Go game in SGF format. To the best of my knowledge, this is the first time a language model is trained from scratch with 1.36M Go games, with a specially designed tokenizer.

Can GoFormer perform reasonably well just by next move (token) prediction, without MCTS[3][4]? Let's find out. My research goals are that:

if language model can reason and plan, it can play Go very well.
if GoFormer can perform reasonably well, it can be used as a baseline for future research in Go game, without the use of tree search.

P.S: it is an intial release of model, and it is expected not to perform very well. But as we have more data, we will see if it can stand a battle with MCTS based engine like Leela Zero.

How to Play against GoFormer?

I've written an UI, please visit https://github.com/kenhktsui/goformer.

Data Preprocessing

We take the leftmost variation of the game tree in SGF format and translate it into PGN.

Tokenizer Design

A tokenizer is designed particularly for Go game. Since it is a 19 x 19 game. We use uppercase alphabet to encode x position and lowercase alphabet to encode y position. We use alphabet instead of numbers to make a clear that 1 token, but not 2 tokens, represents 1 position, to avoid unnecessary learning to map 2 tokens into 1 position. We also use a special token '>' to denote the move by the winner's of the game.
While [7][8] does not indicate who is the winner until the result appended at the end, we argue that without indicating the winner, language model cannot know the winner's move during decoding in inference due to the autoregressive nature. '>' is the symbol to prompt GoFormer for a move during decoding. 'X' represents pass.

Model Input and Output

The go game is framed as language like [7][8]. So all the previous moves (consecutively the game board) are represented as string.

Input:

1. >Dp Ra 2. >

Output:

Pp

Output Postprocessing

To exclude illegal move, we ask GoFormer to suggests K moves, ranked by probabilities. After illegal move is removed, the most probable move is selected.

Performance

This model achieves an eval_loss of 0.419 at step 7,600 (approximately 10.90 epoch).

Future Work

Collate more Go data, particularly self play data. It is quite clear that the size of the existing data is quite trivial compared to modern language model.

Reference

[1] Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
[2] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv:1712.01815, 2017.
[3] Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th International Conference on Computer and Games, 72–83 (2006).
[4] Kocsis, L. & Szepesvari, C. Bandit based Monte-Carlo planning. In ´ 15th European Conference on Machine Learning, 282–293 (2006).
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017.
[6] Radford, Alec and Karthik Narasimhan. “Improving Language Understanding by Generative Pre-Training.” (2018).
[7] D. Noever, M. Ciolino, and J. Kalin. The Chess Transformer: Mastering Play using Generative Language Models, Sept. 2020.
[8] Zhang, Edwin et al. “Transcendence: Generative Models Can Outperform The Experts That Train Them.” (2024).
[9] Ciolino, Matthew et al. “The Go Transformer: Natural Language Modeling for Game Play.” 2020 Third International Conference on Artificial Intelligence for Industries (AI4I) (2020): 23-26.
[10] Radford, Alec et al. “Language Models are Unsupervised Multitask Learners.” (2019).

Citation

If you find this work, please use:

@misc{ktsui2024goformer,
      title={GoFormer - Language Model That Plays Go}, 
      author={Ken Tsui},
      year={2024},
}

kenhktsui
/

goformer-v0.1