Instructions to use OMLCheT/OMLCheT-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OMLCheT/OMLCheT-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OMLCheT/OMLCheT-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("OMLCheT/OMLCheT-v1") model = AutoModelForCausalLM.from_pretrained("OMLCheT/OMLCheT-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OMLCheT/OMLCheT-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OMLCheT/OMLCheT-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OMLCheT/OMLCheT-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OMLCheT/OMLCheT-v1
- SGLang
How to use OMLCheT/OMLCheT-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OMLCheT/OMLCheT-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OMLCheT/OMLCheT-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OMLCheT/OMLCheT-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OMLCheT/OMLCheT-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OMLCheT/OMLCheT-v1 with Docker Model Runner:
docker model run hf.co/OMLCheT/OMLCheT-v1
OMLCheT-v1
A fine-tuned distilgpt2 that generates legal chess moves in Standard Algebraic Notation (SAN), trained on ~86k real games from the Open Machine Learning Chess Tournament dataset.
Model Overview
| Field | Detail |
|---|---|
| Base model | distilbert/distilgpt2 |
| Architecture | Decoder-only Transformer (GPT-2 family) |
| Task | Causal language modelling over SAN move sequences |
| Intended playstyle | Generalist — reproduces human amateur-to-intermediate patterns seen in the training corpus; no explicit tactical or positional bias was enforced |
| Input/Output | Plain SAN string (e.g. e4 e5 Nf3) → continuation (e.g. Nc6 Bc4 …) |
The model treats a chess game as a text sequence: moves are space-separated tokens and the model is trained to predict the next token at each step. During inference, sampling from the model is equivalent to picking the next move.
Architecture Details
All figures are for the base distilgpt2 skeleton; the fine-tuning adds only one new embedding vector (<|chess|>).
| Attribute | Value |
|---|---|
| Total parameters | ~82.7 M |
| Transformer blocks | 6 |
| Embedding dimension | 768 |
| Attention heads | 12 |
| Feed-forward dimension | 3 072 |
| Context window | 1 024 tokens |
| Vocabulary size | 50 258 (50 257 GPT-2 BPE + 1 domain token <|chess|>) |
| Positional encoding | Learned absolute |
| Activation | GELU |
Training Data
| Field | Detail |
|---|---|
| Dataset | OMLCheT/chess-san-base |
| Subset used | clean |
| Volume | ~86 600 games (train: 81 860 / test: 4 740) |
| Source | Open Machine Learning Chess Tournament (OMLCheT) — AI vs AI games played under tournament conditions |
| Format | Raw SAN strings, one game per row, e.g. e4 e5 Nf3 Nc6 Bc4 … |
| Pre-processing | Each game is wrapped as <|chess|> {moves} <|endoftext|> and short games are packed together into 256-token chunks |
Training Porgress
| Training Loss | Validation Loss | Entropy | Num Tokens | Mean Token Accuracy |
|---|---|---|---|---|
| 1.1141 | 1.0671 | 1.0349 | 51,434,331 | 0.6441 |
What the corpus is and isn't:
The games come from ML-agent matches, not human grandmasters or large Lichess databases. This means the model has learned patterns produced by other (possibly imperfect) chess agents, not a broad human-style distribution. Move quality varies widely across the corpus.
Training Methodology
Supervised next-token prediction (standard causal language modelling). No reinforcement learning or RLHF was used.
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Framework | HuggingFace transformers + trl (SFTTrainer) |
| Epochs | 3 |
| Per-device batch size | 16 |
| Gradient accumulation steps | 2 (effective batch = 32) |
| Learning rate | 5 × 10⁻⁴ |
| LR schedule | Cosine decay with 5% warmup |
| Weight decay | 0.01 |
| Optimiser | AdamW (default transformers implementation) |
| Max sequence length | 256 tokens |
| Packing | Enabled (packing=True) — short games concatenated into full-length chunks |
| Precision | bf16 on Ampere+ GPUs, fp16 on older CUDA, fp32 on CPU |
| Seed | 42 |
Training process
- Load
distilgpt2weights from HuggingFace Hub. - Add the <|chess|> domain prefix token and resize token embeddings.
- Format each game: <|chess|> {san_moves}
<EOS>. - Pack multiple short games per 256-token chunk to maximise GPU utilisation.
- Train with cross-entropy loss over all tokens (moves and the prefix).
- Select the checkpoint with the lowest
eval_loss.
Known Limitations / Failure Modes
| Failure mode | Severity | Notes |
|---|---|---|
| Illegal moves | Medium | The model has no explicit legality checker; it occasionally emits moves that are syntactically valid SAN but illegal given the current board position (e.g. moving a pinned piece) |
| Endgame blunders | High | The training corpus is dominated by middlegame positions. The model has seen relatively few endgame sequences and tends to play aimlessly once queens are traded |
| Pawn promotions | Medium–High | Promotion notation (e8=Q, a1=N, etc.) appears infrequently; underpromotions are rarely generated |
| Long games | Medium | At 256 tokens the context window truncates games running past ~60–70 full moves; the model loses positional coherence in very long endgames |
| Repetition | Low–Medium | Without a repetition detector the model can occasionally cycle through the same few moves |
| Opening diversity | Low | The model shows reasonable opening variety for common openings (Italian, Ruy López, Sicilian), but handles rare lines poorly |
| Engine-level play | N/A | This is a language model, not a search-based engine; it does not calculate variations or evaluate positions. Expect amateur-to-club strength at best |
Tip for downstream users: always wrap inference in a legality filter (e.g.
python-chess) and re-sample on illegal output.
Inference Speed
Benchmarked on a single NVIDIA T4 (Colab free tier) with the full fine-tuned checkpoint loaded in fp16:
| Metric | Value |
|---|---|
| Time per move (greedy) | ~15–25 ms |
| Time per move (sampling, top-p=0.9) | ~20–35 ms |
| Moves per second | ~30–60 |
| Full 40-move game generation | ~0.8–1.5 s |
| Memory footprint (fp16) | ~330 MB VRAM |
| Memory footprint (fp32 / CPU) | ~660 MB RAM |
On a Kaggle P100 expect roughly 2× faster; on CPU expect ~200–500 ms per move.
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="OMLCheT/OMLCheT-v1",
torch_dtype=torch.float16,
device=0, # GPU; use -1 for CPU
)
# Provide moves played so far; model continues from here
prompt = "<|chess|> e4 e5 Nf3 Nc6 Bc4"
result = pipe(
prompt,
max_new_tokens=80,
do_sample=True,
temperature=0.8,
top_p=0.9,
pad_token_id=pipe.tokenizer.eos_token_id,
)
print(result[0]["generated_text"])
License
MIT License
This model weights file is released under the MIT License.
- The base model (
distilbert/distilgpt2) is also MIT-licensed. - The training dataset (
OMLCheT/chess-san-base) is released by us — check the dataset card for its specific terms. - Chess move notation (SAN) is in the public domain.
You are free to use, modify, distribute, and build on top of this model for any purpose, commercial or non-commercial, with attribution.
- Downloads last month
- -
Model tree for OMLCheT/OMLCheT-v1
Base model
distilbert/distilgpt2