gemma-4-E2B-it solitaire-advisor LoRA (close-out arm)

A LoRA adapter that distils a 31B Gemma Klondike Solitaire advisor into the ~2B-effective Gemma 4 E2B text model (4-bit, MLX), runnable locally on a 16 GB Apple Silicon Mac.

This is the close-out variant of chayuto/gemma-4-e2b-it-solitaire-advisor-lora (the "volume" lead student). It keeps everything about that recipe and adds one change to the training corpus: a gentle, train-only oversample of winning-game close-out decisions (rows from won games where the board is near-complete, faceDown <= 2). That single change targets a specific failure of the volume student, a false resign on a winnable endgame, and on the in-distribution eval it both removes that failure and raises the win count.

What is and is not established (read this first)

  • In-distribution: improves on the volume lead, reproducibly. On the 13 held-out winnable decks at a 300-turn cap, this adapter wins 8/13 (mean foundation cards 40.8) versus the volume student's 5/13 (meanFC ~30) in the same window. The 8 wins reproduced exactly across two independent eval runs (same 8 decks, meanFC 40.8 both times), so the +3 is not run-to-run noise. It also resigns 0 times where volume resigns 2 (one of which was a false resign on a winnable deck that this adapter instead wins).
  • Generalization: NOT YET TESTED. The volume lead earned its "lead" status partly on a 12-fresh-deck generalization pass (+12.9 paired foundation delta, 9/12 better than base). This close-out adapter has not been run on that fresh-deck set yet. Its claim is therefore narrower: an in-distribution improvement over the volume student, not a demonstrated generalization gain. Until that pass exists, prefer the volume lead if you need the generalization-backed model; use this one for the close-out behaviour and the higher in-distribution win rate.

The teacher is gemma-4-31b-it (Google's Gemma 4 31B, accessed through a separate harvester app). The teacher itself wins roughly 31% of games, so this adapter's ceiling is teacher-level imitation, not optimal play.

Model details

Base model mlx-community/Gemma4-E2B-IT-Text-int4 (Gemma 4 E2B IT, text-only, int4)
Adapter type LoRA over a 4-bit-quantised base
LoRA rank 16 (scale 2.0, dropout 0.05)
LoRA target modules self_attn.{q,k,v,o}_proj, mlp.{gate,up,down}_proj
LoRA layers top 16 layers
Training framework mlx-lm
Hardware Apple Silicon, 16 GB unified memory (Metal GPU)
Adapter size on disk ~51 MB (bf16 LoRA weights over the 4-bit base)
Iterations trained 1,000 (shipped weights = iter 1,000, the evaluated checkpoint)
Decoding used for eval greedy (temp-0.3 JSON parse-rescue only)

Hyperparameters are byte-for-byte identical to the volume lead and the project's other Gemma 4 E2B arms, so the only difference from volume is the corpus reweighting described under Training data.

Intended use

In scope. Acting as a move-selection advisor inside a Klondike Solitaire client that already enforces game rules:

  • Imperfect-information draw-1 Klondike; the advisor is shown the full visible state plus the count of face-down cards.
  • Per-turn decisions: given the harvester prompt (with a LEGAL MOVES block), emit a single JSON object choosing one offered legal move by index, or move_index: -1 to resign.
  • Local inference on Apple Silicon via mlx-lm.

Out of scope. Open-ended chat; game-rule enforcement (it selects from the supplied legalMoves, it does not verify legality); optimal play (teacher ceiling ~31%); other Solitaire variants; non-MLX backends.

Evaluation

Method

Each game is played turn-by-turn under the faithful production prompt (hybrid-v1.6) on a fixed deck, greedy decoding, cap 300 turns, with a tiered JSON parse-rescue (temp-0.3 retry) matching production. Every finished or resigned game is then exact-adjudicated: the recorded decisions are replayed through the engine with zero drift, and the final or resign position is handed to a sound best-first solver (SOLVED / UNSOLVABLE / UNKNOWN at a node cap). This distinguishes a real win from a cap-truncated one, and a correct resign on a dead board from a quit on a winnable one.

The eval set is 13 winnable decks held out by seed from the harvester pool the training data was drawn from. meanFC is the mean cards on foundations (out of 52) at game end; 52 is a win.

In-distribution (13 held-out winnable decks, cap 300)

arm corpus meanFC wins resigns
volume (lead) 6,859 rows, 36% won ~30 5 2
close-out (this adapter) volume + gentle close-out oversample 40.8 8 0

The 8 wins reproduced across two independent runs (the volume baseline scored 5 in the same window). The win that most cleanly shows the effect is #4221577640: the volume student false-resigns it (the board is winnable); this adapter wins it. Of the 5 non-wins here, adjudication shows reach is intact, not a mid-game collapse:

  • 4 are structurally dead (UNSOLVABLE): #1388178981 (fc22), #3263196305 (fc18), #4161700176 (fc12), #4250754298 (fc18). No legal line wins them.
  • 1 is a winnable near-win: #3841057237 reached fc45 fd0 and the solver marks it SOLVED; it was cap-truncated on the last few foundation moves, not stalled.

Generalization

Not yet run. The fresh-deck pass (12 solver-winnable decks with zero corpus overlap) that the volume lead reports has not been executed for this adapter. A generalization number will be added here when it exists. Treat the in-distribution result above as the only verified claim.

Training data

Source. Per-decision play logs from a Klondike client where the 31B gemma-4-31b-it teacher chose moves, published as the chayuto/klondike-llm-decisions dataset (CC-BY-4.0).

Base selection (the "volume" pool). The entire non-eval success pool: 6,859 decisions across 77 games (36% won), with the 13 eval seeds held out. Split at the game level into 5,663 train / 531 validation / 665 test.

Close-out reweighting (this adapter). Train-only: each training row that comes from a won game and shows a near-complete board (faceDown <= 2) gets **one extra copy** (781 such rows, ~14% of train), giving 6,444 train rows. Validation and test are byte-identical to the volume split, so the only difference between this adapter and the volume lead is the train-side reweighting. The move mix stays draw-healthy (draw fraction 44.7% -> 41.4%, foundation 21.9% -> 27.7%); this is a deliberately gentle 1x oversample. An earlier, more aggressive 2x close-out oversample on a loop-compressed base was rejected for regressing mid-game reach; the gentle 1x on the intact volume pool does not show that regression (reach intact, see Evaluation).

Training procedure

model: mlx-community/Gemma4-E2B-IT-Text-int4
max_seq_length: 2048
batch_size: 1
num_layers: 16
grad_checkpoint: true
learning_rate: 2.0e-4
iters: 1000
save_every: 250
val_batches: 25

lora_parameters:
  rank: 16
  scale: 2.0
  dropout: 0.05
  keys:
    - self_attn.q_proj
    - self_attn.k_proj
    - self_attn.v_proj
    - self_attn.o_proj
    - mlp.gate_proj
    - mlp.up_proj
    - mlp.down_proj

Usage

# Apple Silicon, Python 3.12 venv recommended
python3.12 -m venv venv && source venv/bin/activate
pip install mlx mlx-lm huggingface-hub

The Gemma4-E2B-IT-Text-int4 base needs a small sanitize() patch to load on current mlx-lm. The 6-line patch is gemma4_finetune/gemma4_text_patch.py in the source repo (github.com/chayuto/solitaire-analytics); apply it (or use an mlx-lm version that merged the support) before loading.

from huggingface_hub import snapshot_download
from mlx_lm import load, generate
# import gemma4_text_patch  # apply the base-loading patch first (see above)

adapter_path = snapshot_download(
    repo_id="chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora",
    allow_patterns=["adapters.safetensors", "adapter_config.json"],
)

model, tokenizer = load(
    "mlx-community/Gemma4-E2B-IT-Text-int4",
    adapter_path=adapter_path,
)

solitaire_prompt = open("your_solitaire_prompt.txt").read()  # the v1.6 harvester prompt
wrapped = tokenizer.apply_chat_template(
    [{"role": "user", "content": solitaire_prompt}],
    tokenize=False, add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=wrapped, max_tokens=512))

The model emits a JSON object whose final_decision.move_index is a 0-based index into the prompt's legalMoves array (or -1 to resign). Trust move_index, not the prose.

Limitations

  • Generalization unverified. The headline 8/13 is in-distribution only; the fresh-deck pass that backs the volume lead has not been run for this adapter. This is the single most important caveat.
  • Small n. 13 decks, high per-deck variance. Read it as +3 wins reproduced across two runs, not as a precise win rate.
  • Cap difference vs the volume card. The volume model card reports a cap-200 eval; the numbers here are cap-300. The volume baseline was re-measured at cap-300 in the same window (5 wins, meanFC ~30) for an apples-to-apples comparison, so the +3 is at matched cap.
  • JSON discipline is imperfect. Eval used a tiered parse-rescue (temp-0.3 retry); raw greedy output is not 100% valid JSON. For production, use constrained decoding or a JSON grammar at inference.
  • Teacher ceiling. Trained to imitate a teacher that wins ~31% of games; this is an advisor for imperfect-information draw-1 only.
  • Greedy eval only; behaviour under temperature sampling is untested.
  • Base-loading patch required; Apple-Silicon / MLX only; loads only onto mlx-community/Gemma4-E2B-IT-Text-int4.
  • A successor is in training. A recipe-plus-more-data variant (this close-out reweighting applied to a larger, more-winning corpus) is being trained and may supersede this adapter; this repo records the confirmed in-distribution result as of release.

License

The adapter is released under the Gemma Terms of Use (inherited from the base). Use, redistribution, and modification require compliance with the Gemma Prohibited Use Policy. The training and evaluation code in the source repository is MIT. The training data (chayuto/klondike-llm-decisions) is CC-BY-4.0.

Citation

@misc{orapinpatipat2026solitaireadvisorgemma4closeout,
  title  = {Distilling a 31B Klondike Solitaire advisor into Gemma 4 E2B via MLX LoRA: close-out arm},
  author = {Orapinpatipat, Chayut},
  year   = {2026},
  month  = jun,
  howpublished = {\url{https://huggingface.co/chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora}},
  note   = {LoRA adapter; gentle close-out oversample over the volume arm, iter-1000 checkpoint},
}

Acknowledgements

  • Base model mlx-community/Gemma4-E2B-IT-Text-int4 from the mlx-community team.
  • Training framework mlx-lm from Apple Machine Learning Research.
  • Teacher model gemma-4-31b-it from Google DeepMind.

Project status

This is a variant of the project's lead Gemma 4 E2B student. It improves the in-distribution win rate over the volume lead by fixing a false-resign on winnable endgames, confirmed across two eval runs. It is published as a distinct, clearly-scoped artifact; the volume adapter remains the generalization-proven flagship until this adapter's fresh-deck pass is run.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora

Adapter
(6)
this model

Dataset used to train chayuto/gemma-4-e2b-it-solitaire-advisor-volcloseout-lora