halluci-mate-v2a

Alpha release. A chess LLM trained from scratch on the Lichess dataset using the Qwen3-0.6B architecture and a custom UCI move tokenizer. Successor to jspaulsen/halluci-mate-v1a, retrained on a larger Rapid + Classical–only sample for higher-quality move signal. Expect rough edges — move quality, strategy, and robustness are still unvalidated beyond basic smoke tests.

Source: https://github.com/jspaulsen/halluci-mate

Model details

  • Architecture: Qwen3 (Qwen3ForCausalLM), ~0.6B parameters
    • 28 layers, hidden size 1024, 16 attention heads (8 KV heads), intermediate size 3072
    • bfloat16, tied word embeddings, RoPE θ = 1,000,000
  • Vocabulary: 1,974 tokens — 6 special tokens (<PAD>, <UNK>, <EOS>, <WHITE>, <BLACK>, <DRAW>) + ~1,792 geometric UCI moves + 176 promotion moves
  • Context: 32,768 tokens
  • Checkpoint: runs-v1/stylish-bug-611/checkpoint-9687

Tokenizer

The tokenizer is custom and is not loadable via AutoTokenizer.from_pretrained. It is defined in src/halluci_mate/chess_tokenizer.py in the source repo. Install the package and use ChessTokenizer() directly.

Inputs are conditioned on the side-to-move winning: each game is prefixed with <WHITE> or <BLACK> (or <DRAW>), followed by the sequence of UCI moves.

Usage

import chess
import torch
from transformers import AutoModelForCausalLM

from halluci_mate.chess_tokenizer import ChessTokenizer
from halluci_mate.game.game import Game
from halluci_mate.inference import ChessInferenceEngine

engine = ChessInferenceEngine.from_checkpoint(
    "jspaulsen/halluci-mate-v2a",
    constrained=True,   # mask logits to legal moves
    temperature=0.0,    # greedy
)

game = Game(board=chess.Board(), condition="<WHITE>")
move = engine.predict(game)
print(move.uci())

Constrained decoding masks the logits to the set of legal UCI moves in the current position, which eliminates illegal-move hallucinations at the cost of potentially hiding model weaknesses. Unconstrained sampling (constrained=False) will occasionally produce illegal tokens — this is expected for an alpha.

Training

  • Data: 5,000,000 Lichess games, restricted to Rapid + Classical time controls, filtered to Normal termination, SAN parsed to UCI with python-chess
  • Model initialized from config (no pretrained weights) via AutoModelForCausalLM.from_config
  • 1 epoch, batch size 128 per device across 2 GPUs, 9,687 optimizer steps, bf16 + flash_attention_2
  • Training script: scripts/train.py in the source repo

Compared to v1a, v2a uses the Rapid + Classical filter (longer time controls yield higher move quality at a given Elo) over a larger game count. Headline play strength against Stockfish-skill-5 is roughly unchanged from v1a; legality (especially in endgames) is moderately improved. See the source repo for the full eval breakdown.

Limitations

  • Alpha quality; move strength has not been benchmarked against a rated engine
  • Constrained decoding is recommended for any real use — the raw model may emit illegal move tokens
  • Trained on human games, so idiosyncrasies and blunders at lower ratings are reflected in behavior
  • No support for analyzing positions from arbitrary FENs beyond what Game constructs

License

MIT. See the source repo for details.

Downloads last month
21
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jspaulsen/halluci-mate-v2a

Finetunes
1 model