Instructions to use jspaulsen/halluci-mate-v2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jspaulsen/halluci-mate-v2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jspaulsen/halluci-mate-v2b")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jspaulsen/halluci-mate-v2b")
model = AutoModelForCausalLM.from_pretrained("jspaulsen/halluci-mate-v2b")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use jspaulsen/halluci-mate-v2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jspaulsen/halluci-mate-v2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jspaulsen/halluci-mate-v2b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jspaulsen/halluci-mate-v2b

SGLang

How to use jspaulsen/halluci-mate-v2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jspaulsen/halluci-mate-v2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jspaulsen/halluci-mate-v2b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jspaulsen/halluci-mate-v2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jspaulsen/halluci-mate-v2b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jspaulsen/halluci-mate-v2b with Docker Model Runner:
```
docker model run hf.co/jspaulsen/halluci-mate-v2b
```

halluci-mate-v2b

Alpha release. A chess LLM fine-tuned from jspaulsen/halluci-mate-v2a on a higher-quality slice of the Lichess dataset. Uses the Qwen3-0.6B architecture and a custom UCI move tokenizer. First model in the series to score wins against Stockfish skill-5 in 100-game matches.

Source: https://github.com/jspaulsen/halluci-mate

Model details

Architecture: Qwen3 (Qwen3ForCausalLM), ~0.6B parameters
- 28 layers, hidden size 1024, 16 attention heads (8 KV heads), intermediate size 3072
- bfloat16, tied word embeddings, RoPE θ = 1,000,000
Vocabulary: 1,974 tokens — 6 special tokens (<PAD>, <UNK>, <EOS>, <WHITE>, <BLACK>, <DRAW>) + ~1,792 geometric UCI moves + 176 promotion moves
Context: 32,768 tokens
Base model: jspaulsen/halluci-mate-v2a
Checkpoint: runs-v2a-ft/languid-sloth-169/checkpoint-4056

Tokenizer

The tokenizer is custom and is not loadable via AutoTokenizer.from_pretrained. It is defined in src/halluci_mate/chess_tokenizer.py in the source repo. Install the package and use ChessTokenizer() directly.

Inputs are conditioned on the side-to-move winning: each game is prefixed with <WHITE> or <BLACK> (or <DRAW>), followed by the sequence of UCI moves.

Usage

import chess
import torch
from transformers import AutoModelForCausalLM

from halluci_mate.chess_tokenizer import ChessTokenizer
from halluci_mate.game.game import Game
from halluci_mate.inference import ChessInferenceEngine

engine = ChessInferenceEngine.from_checkpoint(
    "jspaulsen/halluci-mate-v2b",
    constrained=True,   # mask logits to legal moves
    temperature=0.0,    # greedy
)

game = Game(board=chess.Board(), condition="<WHITE>")
move = engine.predict(game)
print(move.uci())

Constrained decoding masks the logits to the set of legal UCI moves in the current position, which eliminates illegal-move hallucinations at the cost of potentially hiding model weaknesses. Unconstrained sampling (constrained=False) will occasionally produce illegal tokens — this is expected for an alpha.

Training

Initialized from jspaulsen/halluci-mate-v2a weights
2 epochs, 4,056 optimizer steps, effective batch size 512 (per-device 128 × 2 grad-accum × 2 GPUs)
Optimizer: paged AdamW 8-bit, peak LR 3e-5, cosine-with-min-lr schedule, warmup ratio 0.005
bf16 + flash_attention_2, DDP across 2 GPUs, seed 4042
Training script: scripts/train.py in the source repo
Best eval loss 1.637 at step 4,000 (epoch 1.97)

Headline evals vs v2a and v1b

Same eval configs across all three: vs-stockfish at skill-5 with --sf-analyze, legal-rate over 5,000 sampled positions (seed 0), high-elo perplexity over 10,768 sequences.

Metric	v1b	v2a	v2b
vs-stockfish score-rate, skill-5	0.104 (500g)	0.065 (100g)	0.135 (100g)
vs-stockfish W / L / D	7 / 403 / 90	0 / 87 / 13	3 / 76 / 21
Legal-rate (5,000 sampled positions)	99.06%	99.00%	99.02%
High-elo perplexity (10,768 seqs)	4.92	5.47	5.15
Tactical-oversight, middle phase	21.0%	23.4%	20.3%
Tactical-oversight, endgame	12.4%	12.2%	11.9%
Blunder-rate (in-game)	6.5%	6.5%	6.5%

v2b is the first model in the series to win games against Stockfish skill-5 (3W / 21D over 100 games). Middlegame and endgame tactical oversight are both best-in-class. Perplexity recovers most of the gap v2a opened up vs v1b on the high-elo test set.

Limitations

Alpha quality; move strength has not been benchmarked against a rated engine
Constrained decoding is recommended for any real use — the raw model may emit illegal move tokens
Trained on human games, so idiosyncrasies and blunders at lower ratings are reflected in behavior
No support for analyzing positions from arbitrary FENs beyond what Game constructs

License

MIT. See the source repo for details.

Downloads last month: 36

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for jspaulsen/halluci-mate-v2b

Base model

jspaulsen/halluci-mate-v2a

Finetuned

(1)

this model

Finetunes

2 models