Sid3024
/

Chess-Transformer

Model card Files Files and versions Community

Chess-Transformer / README.md

Sid3024

Update README.md

3391981 verified 9 months ago

preview code

raw

history blame contribute delete

2.17 kB

metadata

license: mit

Github link: https://github.com/Sid3024/Chess-AI

This is a Chess Transformer model trained on 140 million chess positions taken from games on lichess.org, where players had a minimum rating of 2000 and at least rapid time controls (at least 10 minutes).

The model includes the standard vision transformer architecture together with a modified version of the domain-specific smolgen module implemented by Leela Chess Zero in their transformer model: https://lczero.org/blog/2024/02/transformer-progress/.

The idea of smolgen is that the attention between 2 squares should depend on the overall board state as well. In a closed position, squares that are far apart should have their signals constrained, while in an open position their signals should be strengthened. In our implementation, the CLS token is passed through a linear layer that outputs a 4096 dimension vector, that is then reshaped to (64, 64). This is treated as a second attention map that is added to the original before softmaxxing.

We experimented with various tokenization strategies. Our best performing tokenization method is as follows: Convert the board into a 64 length list of encodings, depending on what piece is on each of the 64 squares. Assign 1 embedding for empty squares, 6 embeddings for each of the 6 distinct player pieces, and 6 embeddings for each of the 6 distinct enemy pieces. The board is rotated when it is black's turn. In this way, the input is always from the perspective of the current player, abstracting away the colour and taking advantage of symmetry to simplify the problem and improve performance. This yielded a loss of 1.47, 20% lower than when embeddings represented white and black pieces instead of player and enemy pieces.

The model had 6.8M parameters and was trained for 6 epochs (720K steps). The learning rate decreased every 2 epochs, from 4e-4 to 1e-4 to 3e-5.

The final model achieved a 51% accuracy in determining the best move in the validation dataset, and scored 25/50 against a 1600 rated stockfish engine. This equates to the playing strength of someone who has dedicated 1.5 years to chess, surpassing 90% of all players.