Model Card β€” daydream-chess-nanogpt-1 (v1, Regular)

A sup computer release β€” a small language model studio. Model page Β· monorepo (frozen code: projects/daydream/models/daydream-chess-nanogpt-1/, tag daydream-chess-nanogpt-1) Β· runs in your browser at www.supcpu.com/model-player.

Key takeaways

  • A 2.66M-param char-level GPT that renders illegal chess moves as dim near-misses instead of masking them away β€” the sampler's resample-until-legal loop is the artwork's mechanism, not a correctness patch on top of it.
  • Trained on 15,000 real Lichess games, deliberately mid-rating (~1400–1800 Elo) rather than elite play β€” too-clean games were rejected as a corpus source because they kill the near-miss texture the dream mechanic needs.
  • The release gate is not win rate. It's two automated checks: 100% clean game completion and a 35.3% legal-move rate on the model's raw, unresampled first try β€” a genuine legality-learning signal, not a strength claim.
  • First of three tiers in the daydream family (Micro 5Γ—5, Regular 8Γ—8, Grand 12Γ—10) β€” the first project in this monorepo with an external non-Python engine dependency, Fairy-Stockfish.

A chess-move GPT that doesn't play chess so much as hallucinate it. Legal moves snap into focus; illegal moves render as dim near-misses instead of being masked or rejection-sampled away β€” the sampler is the aesthetic decision, not a correctness filter. First release in the daydream series, and the standard 8Γ—8 board tier β€” "Regular," the implicit default among Micro (5Γ—5) and Grand (12 files Γ— 10 ranks).

Render illegal moves, don't discard them. Most chess-model work treats illegal output as failure to be masked or rejection-sampled away. Daydream inverts that: a candidate move is either legal (it snaps into focus and becomes the game's actual move) or illegal (a rejected dream, kept rather than thrown away). The sampler's resample-until-legal loop is the artwork's mechanism, not a bug-fix on top of it.

Model details

Version / git tag daydream-chess-nanogpt-1 (research run regular-r1)
Architecture modern char-level (RoPE, RMSNorm, bias-free) on the shared core engine β€” no vendored base engine
Size 6 layers Β· 6 heads Β· 192 embedding dim Β· 256 context Β· dropout 0.1 Β· ~2.66M params
Tokenizer character-level, 21-char vocabulary over UCI move text (files a–h, ranks 1–8, promotion letters q/r/b/n, space, newline) β€” meta.pkl is the contract (ADR-0012)
Checkpoint projects/daydream/models/daydream-chess-nanogpt-1/ (weights not committed β€” regenerates deterministically below)
Built on the monorepo's shared core engine
Developed with Claude (Claude Code)
License MIT

Intended use

An exhibit exploring what a chess-move model looks like when illegal output is rendered rather than hidden. Not intended to play strong chess β€” legality and interestingness of the near-misses are the point, not playing strength. Pairs with harness.py, which plays the model against Fairy-Stockfish, resampling on illegal moves until one lands (or forcing a random legal move once a resample cap is hit, so games always complete).

Out of scope. Not a chess engine and not evaluated as one β€” no win-rate claims are made or should be inferred. Not a general-purpose language model; its entire vocabulary is UCI chess-move syntax.

Training data

15,000 games from the Lichess open database (January 2018 monthly dump), filtered to games where both players are rated 1400–1800 Elo β€” deliberately mid-band: strong enough for coherent positional shape, loose enough to produce the near-miss texture the dream mechanic needs. Elite/engine games were explicitly rejected as a corpus source elsewhere in this project's design β€” too-clean play kills the texture. Streamed and filtered directly from the compressed dump (zstd decode + Elo filter in one pass via python-chess) without ever storing the full ~5.5GB file. Converted from SAN to UCI move notation. Corpus is downloaded, not committed (regenerates via fetch_filtered.py); only derived artifacts (*.bin, *.pkl, *.pt) are gitignored.

Training procedure

  • Optimizer: AdamW, LR 3e-4 with cosine decay to 3e-5, 100 warmup iters, Ξ²β‚‚ 0.99, batch size 64.
  • Run: 3,000 iterations, best val loss 0.858 (always_save_checkpoint).
  • Hardware: Apple Silicon Mac (MPS / Metal backend), torch.compile disabled.

Evaluation

Verification runs harness.py: the model plays full games against a skill-limited Fairy-Stockfish opponent, resampling on illegal moves. The two automated gate metrics β€” nothing else is an automated gate, per this project's design:

Metric Result (30 games)
Clean completion rate 30/30 (100%) β€” every game reached a natural end (checkmate/stalemate/ply cap) with no pipeline crash
Legal-move rate (first try) 258/731 (35.3%) β€” over a third of the model's move proposals are legal in the current position on the very first sample, with no resampling

What 35.3% means here

This is not a chess-strength number β€” win rate against the opponent is explicitly not part of this project's release gate. It's a legality-learning number: roughly one in three raw samples from the model, with no rejection sampling applied, land on a real legal move in the actual current position. The other two-thirds are the dream β€” syntactically valid UCI strings (e2e4-shaped) that are illegal here, rendered rather than discarded by the harness's resample loop.

Limitations

  • Not evaluated for playing strength, deliberately β€” this project's gate is legality and completion, not win rate.
  • Legality is learned, not guaranteed. Even the harness's resample loop has a cap; beyond it, a uniformly random legal move is forced so the game still completes. That fallback move is not the model's "dream" β€” it's the harness's safety valve.
  • UCI-only vocabulary. No SAN, no natural language, no commentary β€” 21 characters, chess moves and nothing else.
  • No weights in the tree (ADR-0002).

How to reproduce

cd projects/daydream/models/daydream-chess-nanogpt-1
python fetch_filtered.py      # -> games.txt (network; Lichess Jan 2018 dump)
python prepare.py             # -> regular/{train,val}.bin + meta.pkl
python train.py config.py     # -> ./ckpt.pt (3000 iters, val ~0.86)
python harness.py --games 30  # verification: legal-move rate, clean-completion rate

Experiment write-up: Can a chess model's illegal moves be the point?

Requires Fairy-Stockfish on PATH (brew install fairy-stockfish) for harness.py β€” see ADR-0021.

Citation / credits

  • The shared core engine (modern nanoGPT lineage β€” RoPE, RMSNorm, bias-free).
  • Fairy-Stockfish β€” legality arbiter and self-play engine for the sibling Micro/Grand tiers.
  • The Lichess open database β€” corpus source.
  • Set up and trained with Claude (Claude Code).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support