Instructions to use jspaulsen/halluci-mate-v2a with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jspaulsen/halluci-mate-v2a with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jspaulsen/halluci-mate-v2a")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jspaulsen/halluci-mate-v2a") model = AutoModelForCausalLM.from_pretrained("jspaulsen/halluci-mate-v2a") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jspaulsen/halluci-mate-v2a with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jspaulsen/halluci-mate-v2a" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jspaulsen/halluci-mate-v2a", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jspaulsen/halluci-mate-v2a
- SGLang
How to use jspaulsen/halluci-mate-v2a with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jspaulsen/halluci-mate-v2a" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jspaulsen/halluci-mate-v2a", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jspaulsen/halluci-mate-v2a" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jspaulsen/halluci-mate-v2a", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jspaulsen/halluci-mate-v2a with Docker Model Runner:
docker model run hf.co/jspaulsen/halluci-mate-v2a
halluci-mate-v2a
Alpha release. A chess LLM trained from scratch on the Lichess dataset using the Qwen3-0.6B architecture and a custom UCI move tokenizer. Successor to jspaulsen/halluci-mate-v1a, retrained on a larger Rapid + Classical–only sample for higher-quality move signal. Expect rough edges — move quality, strategy, and robustness are still unvalidated beyond basic smoke tests.
Source: https://github.com/jspaulsen/halluci-mate
Model details
- Architecture: Qwen3 (
Qwen3ForCausalLM), ~0.6B parameters- 28 layers, hidden size 1024, 16 attention heads (8 KV heads), intermediate size 3072
bfloat16, tied word embeddings, RoPE θ = 1,000,000
- Vocabulary: 1,974 tokens — 6 special tokens (
<PAD>,<UNK>,<EOS>,<WHITE>,<BLACK>,<DRAW>) + ~1,792 geometric UCI moves + 176 promotion moves - Context: 32,768 tokens
- Checkpoint:
runs-v1/stylish-bug-611/checkpoint-9687
Tokenizer
The tokenizer is custom and is not loadable via AutoTokenizer.from_pretrained. It is defined in src/halluci_mate/chess_tokenizer.py in the source repo. Install the package and use ChessTokenizer() directly.
Inputs are conditioned on the side-to-move winning: each game is prefixed with <WHITE> or <BLACK> (or <DRAW>), followed by the sequence of UCI moves.
Usage
import chess
import torch
from transformers import AutoModelForCausalLM
from halluci_mate.chess_tokenizer import ChessTokenizer
from halluci_mate.game.game import Game
from halluci_mate.inference import ChessInferenceEngine
engine = ChessInferenceEngine.from_checkpoint(
"jspaulsen/halluci-mate-v2a",
constrained=True, # mask logits to legal moves
temperature=0.0, # greedy
)
game = Game(board=chess.Board(), condition="<WHITE>")
move = engine.predict(game)
print(move.uci())
Constrained decoding masks the logits to the set of legal UCI moves in the current position, which eliminates illegal-move hallucinations at the cost of potentially hiding model weaknesses. Unconstrained sampling (constrained=False) will occasionally produce illegal tokens — this is expected for an alpha.
Training
- Data: 5,000,000 Lichess games, restricted to Rapid + Classical time controls, filtered to
Normaltermination, SAN parsed to UCI withpython-chess - Model initialized from config (no pretrained weights) via
AutoModelForCausalLM.from_config - 1 epoch, batch size 128 per device across 2 GPUs, 9,687 optimizer steps, bf16 + flash_attention_2
- Training script:
scripts/train.pyin the source repo
Compared to v1a, v2a uses the Rapid + Classical filter (longer time controls yield higher move quality at a given Elo) over a larger game count. Headline play strength against Stockfish-skill-5 is roughly unchanged from v1a; legality (especially in endgames) is moderately improved. See the source repo for the full eval breakdown.
Limitations
- Alpha quality; move strength has not been benchmarked against a rated engine
- Constrained decoding is recommended for any real use — the raw model may emit illegal move tokens
- Trained on human games, so idiosyncrasies and blunders at lower ratings are reflected in behavior
- No support for analyzing positions from arbitrary FENs beyond what
Gameconstructs
License
MIT. See the source repo for details.
- Downloads last month
- 21