Chessformer v0
A ~50M parameter transformer trained to imitate human chess play across the full Elo spectrum, conditioned on player Elo, remaining clock time, and time increment.
44.4% top-1 joint move accuracy on a held-out, Elo-balanced validation set (predicting the exact human move β from-square and to-square).
What it does
Attempts to predict the human move. The goal is for the model to learn the human way to play chess, and later to encode the position the way a human would see it, with human intuition, strength and time management.
- Strength is a dial. Pass
white_elo=1200and it plays like a 1200. Passwhite_elo=2800and it plays like a 2800. - No rules, no search. Legal-move understanding is emergent from training data only.
- Clock-aware. Conditioning on remaining time lets the model represent time-pressure play.
- One forward pass per move. From-square and to-square are predicted autoregressively in a single masked forward, not via tree search.
Usage
import chess, torch
from chessformer.model import ChessformerModel, unwrap_state_dict
from chessformer.tokenizer import build_vocab
from chessformer.inference import pick_move
vocab = build_vocab()
ckpt = torch.load("chessformer_v0.pt", map_location="cpu")
model = ChessformerModel(vocab=vocab, **ckpt["cfg"]["model"])
model.load_state_dict(unwrap_state_dict(ckpt["model"]), strict=True)
model.eval()
board = chess.Board()
move = pick_move(model, vocab, torch.device("cpu"), board,
white_elo=2500, black_elo=2500,
white_clock_s=120.0, black_clock_s=120.0,
increment_s=5.0)
print(move) # e.g. e2e4
Architecture
| Param | Value |
|---|---|
| Parameters | ~50M |
| d_model | 512 |
| n_heads | 8 |
| n_layers | 12 |
| FFN | SwiGLU, 4Γ expansion |
| Norm | RMSNorm (pre-norm) |
| Attention | Flash Attention (SDPA) |
| Max sequence length | 41 tokens |
Board pieces use additive embeddings: emb[color] + emb[piece_type] + emb[file] + emb[rank]. No positional encoding needed. Elo, clock, and increment use soft bracket interpolation.
Training
- Data: Lichess standard rated games, January 2018 (~100k games, ~6.6M positions)
- Val/Test: December 2017 (different month β no game-level leakage), Elo-balanced across 1000β2900
- Steps: 20,000
- Batch size: 1024 (2Γ T4 GPU, DDP)
- LR: 3e-4, cosine decay with 1000-step warmup
- Optimizer: AdamW, weight decay 0.01
Metrics (test set, Elo-balanced, 671k positions)
All numbers from the held-out test split (Lichess December 2017 β a different month from training data).
Top-1 accuracy β the model's top prediction exactly matches the human move (from-square and to-square). A random legal move scores ~3β5%.
Plausible 20% β the human move is assigned β₯ 20% probability in both the from-square and to-square distributions (the model "sees" the idea even when not picking it first).
| Metric | Value |
|---|---|
| Top-1 move accuracy | 44.4% |
| Plausible 20% | 62.9% |
| Puzzle solved (all moves correct) | 28.0% |
| Puzzle advancement (avg fraction solved) | 34.0% |
Accuracy is flat and best across 1100β2200 Elo, where training data is abundant. Puzzle-solving is fully emergent β the model was never trained on puzzles.
License
Apache 2.0. Training data from database.lichess.org (CC0).
Citation
author = {Albert-Roulhac, Edouard},
title = {Chessformer: Elo-conditioned human chess imitation via transformer},
year = {2024},
url = {https://github.com/edarsem/chessformer},
}