Instructions to use chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora" --prompt "Once upon a time"
gemma-4-E2B-it solitaire-advisor LoRA (close-out arm)
A LoRA adapter that distils a 31B Gemma Klondike Solitaire advisor into the ~2B-effective Gemma 4 E2B text model (4-bit, MLX), runnable locally on a 16 GB Apple Silicon Mac.
This is the close-out variant of
chayuto/gemma-4-e2b-it-solitaire-advisor-lora
(the "volume" lead student). It keeps everything about that recipe and adds one
change to the training corpus: a gentle, train-only oversample of winning-game
close-out decisions (rows from won games where the board is near-complete,
faceDown <= 2). That single change targets a specific failure of the volume
student, a false resign on a winnable endgame, and on the in-distribution eval
it both removes that failure and raises the win count.
What is and is not established (read this first)
- In-distribution: improves on the volume lead, reproducibly. On the 13 held-out winnable decks at a 300-turn cap, this adapter wins 8/13 (mean foundation cards 40.8) versus the volume student's 5/13 (meanFC ~30) in the same window. The 8 wins reproduced exactly across two independent eval runs (same 8 decks, meanFC 40.8 both times), so the +3 is not run-to-run noise. It also resigns 0 times where volume resigns 2 (one of which was a false resign on a winnable deck that this adapter instead wins).
- Generalization: NOT YET TESTED. The volume lead earned its "lead" status partly on a 12-fresh-deck generalization pass (+12.9 paired foundation delta, 9/12 better than base). This close-out adapter has not been run on that fresh-deck set yet. Its claim is therefore narrower: an in-distribution improvement over the volume student, not a demonstrated generalization gain. Until that pass exists, prefer the volume lead if you need the generalization-backed model; use this one for the close-out behaviour and the higher in-distribution win rate.
The teacher is gemma-4-31b-it (Google's Gemma 4 31B, accessed through a
separate harvester app). The teacher itself wins roughly 31% of games, so this
adapter's ceiling is teacher-level imitation, not optimal play.
Model details
| Base model | mlx-community/Gemma4-E2B-IT-Text-int4 (Gemma 4 E2B IT, text-only, int4) |
| Adapter type | LoRA over a 4-bit-quantised base |
| LoRA rank | 16 (scale 2.0, dropout 0.05) |
| LoRA target modules | self_attn.{q,k,v,o}_proj, mlp.{gate,up,down}_proj |
| LoRA layers | top 16 layers |
| Training framework | mlx-lm |
| Hardware | Apple Silicon, 16 GB unified memory (Metal GPU) |
| Adapter size on disk | ~51 MB (bf16 LoRA weights over the 4-bit base) |
| Iterations trained | 1,000 (shipped weights = iter 1,000, the evaluated checkpoint) |
| Decoding used for eval | greedy (temp-0.3 JSON parse-rescue only) |
Hyperparameters are byte-for-byte identical to the volume lead and the project's other Gemma 4 E2B arms, so the only difference from volume is the corpus reweighting described under Training data.
Intended use
In scope. Acting as a move-selection advisor inside a Klondike Solitaire client that already enforces game rules:
- Imperfect-information draw-1 Klondike; the advisor is shown the full visible state plus the count of face-down cards.
- Per-turn decisions: given the harvester prompt (with a
LEGAL MOVESblock), emit a single JSON object choosing one offered legal move by index, ormove_index: -1to resign. - Local inference on Apple Silicon via
mlx-lm.
Out of scope. Open-ended chat; game-rule enforcement (it selects from the
supplied legalMoves, it does not verify legality); optimal play (teacher
ceiling ~31%); other Solitaire variants; non-MLX backends.
Evaluation
Method
Each game is played turn-by-turn under the faithful production prompt
(hybrid-v1.6) on a fixed deck, greedy decoding, cap 300 turns, with a
tiered JSON parse-rescue (temp-0.3 retry) matching production. Every finished or
resigned game is then exact-adjudicated: the recorded decisions are replayed
through the engine with zero drift, and the final or resign position is handed
to a sound best-first solver (SOLVED / UNSOLVABLE / UNKNOWN at a node cap). This
distinguishes a real win from a cap-truncated one, and a correct resign on a
dead board from a quit on a winnable one.
The eval set is 13 winnable decks held out by seed from the harvester pool
the training data was drawn from. meanFC is the mean cards on foundations
(out of 52) at game end; 52 is a win.
In-distribution (13 held-out winnable decks, cap 300)
| arm | corpus | meanFC | wins | resigns |
|---|---|---|---|---|
| volume (lead) | 6,859 rows, 36% won | ~30 | 5 | 2 |
| close-out (this adapter) | volume + gentle close-out oversample | 40.8 | 8 | 0 |
The 8 wins reproduced across two independent runs (the volume baseline scored 5
in the same window). The win that most cleanly shows the effect is
#4221577640: the volume student false-resigns it (the board is winnable); this
adapter wins it. Of the 5 non-wins here, adjudication shows reach is intact, not
a mid-game collapse:
- 4 are structurally dead (UNSOLVABLE):
#1388178981(fc22),#3263196305(fc18),#4161700176(fc12),#4250754298(fc18). No legal line wins them. - 1 is a winnable near-win:
#3841057237reached fc45 fd0 and the solver marks it SOLVED; it was cap-truncated on the last few foundation moves, not stalled.
Generalization
Not yet run. The fresh-deck pass (12 solver-winnable decks with zero corpus overlap) that the volume lead reports has not been executed for this adapter. A generalization number will be added here when it exists. Treat the in-distribution result above as the only verified claim.
Training data
Source. Per-decision play logs from a Klondike client where the 31B
gemma-4-31b-it teacher chose moves, published as the
chayuto/klondike-llm-decisions
dataset (CC-BY-4.0).
Base selection (the "volume" pool). The entire non-eval success pool: 6,859 decisions across 77 games (36% won), with the 13 eval seeds held out. Split at the game level into 5,663 train / 531 validation / 665 test.
Close-out reweighting (this adapter). Train-only: each training row that comes from a won game and shows a near-complete board (faceDown <= 2) gets **one extra copy** (781 such rows, ~14% of train), giving 6,444 train rows. Validation and test are byte-identical to the volume split, so the only difference between this adapter and the volume lead is the train-side reweighting. The move mix stays draw-healthy (draw fraction 44.7% -> 41.4%, foundation 21.9% -> 27.7%); this is a deliberately gentle 1x oversample. An earlier, more aggressive 2x close-out oversample on a loop-compressed base was rejected for regressing mid-game reach; the gentle 1x on the intact volume pool does not show that regression (reach intact, see Evaluation).
Training procedure
model: mlx-community/Gemma4-E2B-IT-Text-int4
max_seq_length: 2048
batch_size: 1
num_layers: 16
grad_checkpoint: true
learning_rate: 2.0e-4
iters: 1000
save_every: 250
val_batches: 25
lora_parameters:
rank: 16
scale: 2.0
dropout: 0.05
keys:
- self_attn.q_proj
- self_attn.k_proj
- self_attn.v_proj
- self_attn.o_proj
- mlp.gate_proj
- mlp.up_proj
- mlp.down_proj
Usage
# Apple Silicon, Python 3.12 venv recommended
python3.12 -m venv venv && source venv/bin/activate
pip install mlx mlx-lm huggingface-hub
The Gemma4-E2B-IT-Text-int4 base needs a small sanitize() patch to load on
current mlx-lm. The 6-line patch is gemma4_finetune/gemma4_text_patch.py in
the source repo
(github.com/chayuto/solitaire-analytics);
apply it (or use an mlx-lm version that merged the support) before loading.
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
# import gemma4_text_patch # apply the base-loading patch first (see above)
adapter_path = snapshot_download(
repo_id="chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora",
allow_patterns=["adapters.safetensors", "adapter_config.json"],
)
model, tokenizer = load(
"mlx-community/Gemma4-E2B-IT-Text-int4",
adapter_path=adapter_path,
)
solitaire_prompt = open("your_solitaire_prompt.txt").read() # the v1.6 harvester prompt
wrapped = tokenizer.apply_chat_template(
[{"role": "user", "content": solitaire_prompt}],
tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=wrapped, max_tokens=512))
The model emits a JSON object whose final_decision.move_index is a 0-based
index into the prompt's legalMoves array (or -1 to resign). Trust
move_index, not the prose.
Limitations
- Generalization unverified. The headline 8/13 is in-distribution only; the fresh-deck pass that backs the volume lead has not been run for this adapter. This is the single most important caveat.
- Small n. 13 decks, high per-deck variance. Read it as +3 wins reproduced across two runs, not as a precise win rate.
- Cap difference vs the volume card. The volume model card reports a cap-200 eval; the numbers here are cap-300. The volume baseline was re-measured at cap-300 in the same window (5 wins, meanFC ~30) for an apples-to-apples comparison, so the +3 is at matched cap.
- JSON discipline is imperfect. Eval used a tiered parse-rescue (temp-0.3 retry); raw greedy output is not 100% valid JSON. For production, use constrained decoding or a JSON grammar at inference.
- Teacher ceiling. Trained to imitate a teacher that wins ~31% of games; this is an advisor for imperfect-information draw-1 only.
- Greedy eval only; behaviour under temperature sampling is untested.
- Base-loading patch required; Apple-Silicon / MLX only; loads only onto
mlx-community/Gemma4-E2B-IT-Text-int4. - A successor is in training. A recipe-plus-more-data variant (this close-out reweighting applied to a larger, more-winning corpus) is being trained and may supersede this adapter; this repo records the confirmed in-distribution result as of release.
License
The adapter is released under the Gemma Terms of Use (inherited from the
base). Use, redistribution, and modification require compliance with the
Gemma Prohibited Use Policy.
The training and evaluation code in the source repository is MIT. The
training data (chayuto/klondike-llm-decisions)
is CC-BY-4.0.
Citation
@misc{orapinpatipat2026solitaireadvisorgemma4closeout,
title = {Distilling a 31B Klondike Solitaire advisor into Gemma 4 E2B via MLX LoRA: close-out arm},
author = {Orapinpatipat, Chayut},
year = {2026},
month = jun,
howpublished = {\url{https://huggingface.co/chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora}},
note = {LoRA adapter; gentle close-out oversample over the volume arm, iter-1000 checkpoint},
}
Acknowledgements
- Base model
mlx-community/Gemma4-E2B-IT-Text-int4from themlx-communityteam. - Training framework
mlx-lmfrom Apple Machine Learning Research. - Teacher model
gemma-4-31b-itfrom Google DeepMind.
Project status
This is a variant of the project's lead Gemma 4 E2B student. It improves the
in-distribution win rate over the volume lead by fixing a false-resign on
winnable endgames, confirmed across two eval runs. It is published as a
distinct, clearly-scoped artifact; the
volume
adapter remains the generalization-proven flagship until this adapter's
fresh-deck pass is run.
Quantized
Model tree for chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora
Base model
google/gemma-4-E2B