Instructions to use chayuto/gemma-3n-e2b-it-solitaire-advisor-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use chayuto/gemma-3n-e2b-it-solitaire-advisor-lora with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("chayuto/gemma-3n-e2b-it-solitaire-advisor-lora") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use chayuto/gemma-3n-e2b-it-solitaire-advisor-lora with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "chayuto/gemma-3n-e2b-it-solitaire-advisor-lora" --prompt "Once upon a time"
gemma-3n-E2B-it solitaire-advisor LoRA
A LoRA adapter that distils a 31B Gemma Klondike Solitaire advisor into the ~2B-effective Gemma 3n E2B text-only model, runnable locally on a 16 GB Apple Silicon Mac via MLX.
A newer model is available. The Gemma 4 E2B successor, same teacher and data pipeline but evaluated on full games, is published at
chayuto/gemma-4-e2b-it-solitaire-advisor-loraand is the project's lead student. This Gemma 3n repo remains the v1 baseline.
This is the first distillation run. The shipped weights
(adapters.safetensors) are the iter-750 checkpoint, the best of a
1,000-iter training based on intermediate-checkpoint evaluation. It nearly
closes the tier-score gap to the teacher (-1.32 -> -0.27, ~80 %
recovery on the 20-state eval bench) and recovers 6 of 7 teacher-foundation
moves the untuned base model missed, including a triple-replicated failure
state that had defeated three prior small-model experiments. The iter-1000
checkpoint (under checkpoints/) is also available but is mildly overfit:
mean tier 2.75 vs iter-750's 3.15, with 2 fewer foundation recoveries. See
the learning-curve section for
details.
Why Gemma 3n and not Gemma 4 E2B? The intended student was Gemma 4 E2B, the same series as the teacher and the project's long-term target. As of
mlx-lm0.31.3 (the latest at this writing), all mlx-community Gemma 4 E2B quants fail to load with a 140-parameter architecture-mismatch error (layers 15-34's alternating-attentionk_norm/k_proj/v_projweights are not implemented in mlx-lm'sGemma4Modelclass). The student here is therefore the previous-generationgemma-3n-E2B, whichmlx-lmfully supports. A Gemma 4 E2B variant of this adapter is now published atchayuto/gemma-4-e2b-it-solitaire-advisor-lora: the base was unblocked with a small localsanitize()patch, same training script and data pipeline. The successor is evaluated on full games, where it beats the untuned base and generalizes to fresh, never-seen decks.
The teacher is gemma-4-31b-it (Google's Gemma 4 31B, accessed through a
separate harvester app).
Model details
| Base model | mlx-community/gemma-3n-E2B-it-text-4bit-dwq |
| Adapter type | LoRA (QLoRA over a 4-bit DWQ-quantised base) |
| LoRA rank | 16 (scale 2.0, dropout 0.05) |
| LoRA target modules | self_attn.{q,k,v,o}_proj, mlp.{gate,up,down}_proj |
| Trainable params | 11.27 M of 4.46 B total (0.253%) |
| Training framework | mlx-lm 0.31.3 |
| Hardware | Apple M5 16 GB unified memory (Metal GPU) |
| Adapter size on disk | 45 MB per checkpoint |
| Iterations trained | 1,000 (shipped checkpoint = iter 750) |
| Wall-clock training time | ~95 minutes |
| Quantisation | base remains 4-bit; LoRA weights bfloat16 |
Intended use
In scope. Acting as a move-selection advisor inside a Klondike Solitaire client that already enforces game rules:
- Imperfect-information draw-1 Klondike (one card flipped from stock per draw); the advisor is shown the full visible state plus the count of face-down cards
- Single-turn decisions: given the prompt schema below, emit a single JSON object choosing one of the offered legal moves
- Local inference on Apple Silicon (8 GB+ unified memory) via
mlx-lm
Out of scope.
- Open-ended chat or general-purpose text generation. The model has been fine-tuned to a narrow JSON-emitting role and is expected to be measurably worse than the base model at unrelated tasks.
- Game-rule enforcement. The advisor selects from a
legalMovesarray supplied in the prompt; it does not verify legality from first principles and should not be trusted to do so. - Optimal Solitaire play. The distillation target is a 31B model that itself is imperfect (it stalls on ~21 % of post-cutover games observed in production). This adapter inherits that ceiling.
- Other Solitaire variants (Spider, FreeCell, etc.), out-of-distribution.
Usage
Install
# Apple Silicon, Python 3.12 venv recommended (mlx wheels are not on 3.14+)
python3.12 -m venv venv && source venv/bin/activate
pip install mlx mlx-lm huggingface-hub
Quick start, Python
from huggingface_hub import snapshot_download
from mlx_lm import load, generate
# Pull the adapter once (~45 MB for the shipped iter-750 weights;
# checkpoints/ subdir adds another 180 MB if you want intermediate iters too).
adapter_path = snapshot_download(
repo_id="chayuto/gemma-3n-e2b-it-solitaire-advisor-lora",
allow_patterns=["adapters.safetensors", "adapter_config.json"],
)
# Load base + LoRA. First call also downloads the base model (~3 GB).
model, tokenizer = load(
"mlx-community/gemma-3n-E2B-it-text-4bit-dwq",
adapter_path=adapter_path,
)
# Wrap your Solitaire prompt as a single user message and apply the chat
# template (this matches what the model was trained against).
solitaire_prompt = open("your_solitaire_prompt.txt").read()
wrapped = tokenizer.apply_chat_template(
[{"role": "user", "content": solitaire_prompt}],
tokenize=False, add_generation_prompt=True,
)
response = generate(model, tokenizer, prompt=wrapped, max_tokens=512)
print(response)
Resource envelope on Apple M5 16 GB:
- First load: 2.3 s (cached) + ~5 min one-time base-model download
- Per call: ~13 s wall, ~6.3 GB peak Metal memory
- The model is text-only and runs entirely on the local Metal GPU; no network after initial download.
Concrete example
A real prompt from the eval bench is at
eval/sample_prompt.txt (state oscillation-bfb84a,
4471 chars, 4 cards on foundations, 18 face-down, Ace of Spades in waste).
Its tail looks like this:
LEGAL MOVES (respond with the index of your chosen move):
[0] tableau_to_tableau Move KH plus 1 more from column 1 to column 2 (empty)
[1] discard_to_foundation Send AS from the waste to the spades foundation
[2] draw_card Draw the next card from the stock onto the waste
PROGRESS: foundation=4/52, face-down remaining=18, completion=8%
...
Now choose the best move and reply with only the JSON object.
The iter-750 adapter responds with:
{
"board_analysis": "The board is mostly face-down with 18 hidden cards remaining. Foundations are partially built (AH, AD, 2C, S). The waste pile contains the AS. The only legal tableau move is moving the KH and QC from column 1 to column 2, which does not reveal any hidden cards or create any new strategic opportunities.",
"strategic_plan": "The primary objective is to reveal the Ace of Spades and other low cards (2, 3, 5, 6, 8, 9) to the foundations and tableau columns. ...",
"final_decision": {
"move_index": 1,
"confidence": 0.95,
"alternative_move_index": -1
}
}
move_index: 1 correctly sends the Ace of Spades to the spades foundation
- the optimal play here, and a move the untuned base missed on 3/3 runs.
This is the headline
oscillation-bfb84arecovery referenced in §Foundation move recovery below.
Expected I/O contract
Input: a single user-role message containing the full Solitaire game prompt
in the project's harvester format. The prompt must include LEGAL MOVES
block; the model is trained to index into it. See
eval/prompts_C0/*/prompt.txt for 20 worked examples
across early/midgame/oscillation game states.
Output: a single JSON object with three required keys:
{
"board_analysis": "string, terse description of the visible state",
"strategic_plan": "string, why the chosen move is preferred",
"final_decision": {
"move_index": 0,
"confidence": 0.9,
"alternative_move_index": 1
}
}
move_indexis the 0-based index into the prompt'slegalMovesarray. This is the only field the client needs to consume.confidenceis inherited from the teacher and is poorly calibrated (saturates at 0.85-0.95). Do not route on it.alternative_move_indexcan be-1if no alternative is suggested.- The strategic_plan prose sometimes references move indices inconsistently
(an artifact of the
PRIOR REASONINGsection in the training prompts); trustfinal_decision.move_index, not the narrative.
Robustness
Iter-750 produces valid JSON on 20/20 eval states. Two of 20 choose
an illegal move_index (off-by-one on 2-move arrays). Clients should
defensively fall back to the highest-tier legal move when
move_index >= len(legalMoves). The iter-1000 checkpoint under
checkpoints/0001000_adapters.safetensors eliminates both illegals at the
cost of 2 foundation moves, see the learning-curve section to choose.
Training data
Source. Production play logs from a Klondike Solitaire client where the
31B gemma-4-31b-it teacher chose moves turn-by-turn. Logs were collected
between April and May 2026 across multiple app commits and prompt-template
versions. The collection harness, app, and raw harvest format are tracked
in a separate, private repo.
Selection. 1,536 of 1,730 candidate decisions kept after the standard
ingest pipeline filters (success outcome, valid rawResponse JSON with the
three required keys, not from a stalled game). 25 distinct play sessions,
heterogeneous prompt templates (~63 % pre-cutover legacy format, ~37 % the
current production template 0462323c...). Split at the session level
80 / 10 / 10 -> 1,279 train / 126 val / 131 test.
License. Released as a derived training corpus under CC-BY-4.0 in the project's published dataset (separately staged). No personally identifying information; logs contain only game seeds, board states, model responses, and timing.
Known data-quality issues the adapter inherits:
- 11 % of source rows dropped by the ingest filter for malformed
rawResponse, root cause not yet localised in the harvester. - Teacher
confidencefield is saturated (median 0.90, never below 0.80) even in lost games, not a reliable training signal; it is included in completions but treated as suspect downstream. - Mixed prompt-template formats in training; eval is on the most-recent template only.
- No deck seed in logs (open harvester P0), so we cannot verify the teacher's choices against solver-optimal play.
Training procedure
Hyperparameters
model: mlx-community/gemma-3n-E2B-it-text-4bit-dwq
max_seq_length: 2048
batch_size: 1
num_layers: 16
grad_checkpoint: true
learning_rate: 2.0e-4
iters: 1000
steps_per_report: 10
steps_per_eval: 100
save_every: 250
val_batches: 25
lora_parameters:
rank: 16
scale: 2.0
dropout: 0.05
keys:
- self_attn.q_proj
- self_attn.k_proj
- self_attn.v_proj
- self_attn.o_proj
- mlp.gate_proj
- mlp.up_proj
- mlp.down_proj
The keys: list explicitly scopes LoRA to attention + MLP. The default
broader sweep tries to wrap Gemma 3n's altup.prediction_coefs linear, which
breaks altup.predict()'s direct .weight access mid-training.
Loss curve
| iter | train loss | val loss | peak MLX | wall (cumulative) |
|---|---|---|---|---|
| 1 | - | 6.365 | - | 0 m |
| 10 | 3.160 | - | 11.45 GB | ~1 m |
| 100 | 0.388 | 0.426 | 11.45 GB | ~10 m |
| 250 | (checkpoint) | (checkpoint) | 11.49 GB | ~25 m |
| 500 | (checkpoint) | (checkpoint) | 11.49 GB | ~50 m |
| 750 | (checkpoint) | (checkpoint) | 11.49 GB | ~75 m |
| 1000 | 0.222 | 0.369 | 11.49 GB | ~95 m |
Most of the learning happened in the first 100 iters (val 6.365 -> 0.426). Iters 100-1,000 contributed an additional 0.057 of val-loss improvement - diminishing but still positive at 1,000. Train/val gap at the end is 0.147 (mild memorisation, val still trending down).
Checkpoints
All four training checkpoints are published under checkpoints/:
0000250_adapters.safetensors0000500_adapters.safetensors0000750_adapters.safetensors(= the rootadapters.safetensors, best by tier score)0001000_adapters.safetensors(final iter; mildly overfit, see learning curve below)
Learning curve, why iter 750, not iter 1000
The 20-state eval bench was run against the untuned base and each of the four saved checkpoints. The curve is not monotonic:
| checkpoint | mean tier | Δ vs teacher | Δ vs untuned | foundation recovery (of 7) | illegal | JSON valid |
|---|---|---|---|---|---|---|
| untuned (iter 0) | 2.10 | -1.32 | 0.00 | 2 / 7 | 1 | 20 / 20 |
| iter 250 | 2.10 | -1.32 | 0.00 | 3 / 7 | 3 | 20 / 20 |
| iter 500 | 2.60 | -0.82 | +0.50 | 4 / 7 | 2 | 18 / 20 |
| iter 750 | 3.15 | -0.27 | +1.05 | 6 / 7 | 2 | 20 / 20 |
| iter 1000 | 2.75 | -0.67 | +0.65 | 4 / 7 | 0 | 20 / 20 |
Key observations:
- Iter 750 is the strategic peak. Mean tier 3.15 is within 0.27 of the 31B teacher's 3.42; 6 of 7 teacher-foundation states are correctly recovered (vs only 2 of 7 untuned, 4 of 7 at iter 1000).
- Iter 1000 has lost ground. Two of the four foundation moves recovered at iter 750 regressed to non-foundation choices by iter 1000. Mean tier dropped from 3.15 to 2.75.
- There is a real format / strategy tradeoff at iter 1000. It is the only checkpoint with zero illegal moves, but the strategic regression outweighs the marginal format gain (iter 750's 2 illegal moves can be handled client-side by falling back to a draw).
- Iter 500 had a brief JSON-format instability (2/20 generations missed the JSON schema). This had recovered by iter 750. Worth flagging as a known training-dynamics quirk on this dataset size.
If you want strict format reliability at the cost of strategic strength,
the iter-1000 weights under checkpoints/0001000_adapters.safetensors are
appropriate. For most use cases, the shipped iter-750 weights are the right
default.
Raw per-checkpoint scored eval results are published under eval/:
posttune_at250.json, posttune_at500.json, posttune_at750.json,
posttune_at1000.json, plus the aggregated learning_curve.json.
Evaluation
Bench
20-state evaluation bench drawn from the Phase 1.5 prompt-format study,
composed of 5 early-game, 8 midgame, and 7 oscillation states from two
post-cutover production sessions (template 0462323c...). The bench
intentionally includes 7 states where the 31B teacher chose a
{tableau,discard}_to_foundation move, the failure mode this distillation
was most intended to fix.
The 31B teacher's pick on each state is the production-recorded ground truth. Tier scoring (foundation = 6, reveal = 5, waste_play = 4, shuffle = 2, draw = 1, recycle = 1, illegal = 0) is the same scale used in the A4 Phase 1.5 prompt-format study. A single generation per state was used; future work should add multiple runs per state for variance estimation.
Headline (iter-750 shipped weights)
| metric | untuned base | this adapter (iter 750) | iter 1000 (for ref) | 31B teacher |
|---|---|---|---|---|
| JSON validity | 20 / 20 | 20 / 20 | 20 / 20 | - |
| Illegal moves chosen | 1 / 20 | 2 / 20 | 0 / 20 | - |
| Teacher-pick agreement | 11 / 20 | 11 / 20 | 11 / 20 | - |
| Mean tier (all 20) | 2.10 | 3.15 | 2.75 | 3.42 |
| Gap to teacher | -1.32 | -0.27 | -0.67 | - |
| Foundation recovery (of 7 missed) | 2 / 7 | 6 / 7 | 4 / 7 | - |
Teacher-pick agreement is unchanged in count but shifted in composition: the adapter recovered some agreements on foundation states and lost some on states where its disagreement is a strictly higher-tier move than the teacher chose. Raw agreement under-counts the improvement; tier score captures it.
Per category
| category | n | untuned | adapter (iter 750) | iter 1000 | teacher | Δ adapter vs untuned |
|---|---|---|---|---|---|---|
| early | 5 | 2.60 | 4.20 | 3.20 | 4.20 | +1.60 (matches teacher) |
| midgame | 8 | 1.38 | 2.12 | 1.75 | 2.00 | +0.74 (beats teacher mean) |
| oscillation | 7 | 2.57 | 3.57 | 3.57 | 4.29 | +1.00 |
Oscillation gained the most, same category where the foundation-miss failure mode lived in the untuned base.
Foundation-move recovery (the primary fine-tuning target)
The bench includes 7 states where the teacher chose a foundation move. At iter 750, 6 of 7 are correctly recovered:
| state | untuned (iter 0) | iter 750 (shipped) | iter 1000 | teacher |
|---|---|---|---|---|
early-3687a40eda7b |
shuffle | foundation | foundation | foundation |
early-e6291973dd07 |
shuffle | foundation | draw | foundation |
midgame-4ab5735a4f20 |
draw | foundation | draw | foundation |
oscillation-026f3139d6f2 |
foundation | foundation | foundation | foundation |
oscillation-30700e2ca639 |
foundation | foundation | foundation | foundation |
oscillation-a774c0d22f24 |
draw | foundation | shuffle | foundation |
oscillation-bfb84ae55c3f |
draw | foundation | foundation | foundation |
| Recovered count | 2 / 7 | 6 / 7 | 4 / 7 | - |
oscillation-bfb84a is notable: previously a 3-experiment replicated failure
mode (C0-Haiku missed 1/3 runs, A4-Haiku missed 1/3 runs, untuned 3n-E2B
missed 3/3 runs). The adapter solves it from iter 250 onward and the
solution is stable through iter 1000.
The single state still missed at iter 750 (midgame-4ab5735a4f20) was also
the hardest at iter 1000 (still missed there too). It is a state where the
foundation move is at move_index=1 of a 4-move array; the adapter
consistently prefers move_index=0 (a draw). Probably needs targeted
training-data augmentation to fix.
Adapter strictly outperforms the teacher on three states
States where the iter-750 adapter's pick is a higher tier than the teacher's:
| state | teacher | iter-750 adapter | tier improvement |
|---|---|---|---|
midgame-0d463176c4be |
draw (1) | shuffle (2) | +1 |
midgame-a658537fe2ae |
draw (1) | shuffle (2) | +1 |
oscillation-21cc5243e1d8 |
draw (1) | shuffle (2) | +1 |
These are all draw -> shuffle substitutions: when the teacher punted with a draw, the adapter found a productive tableau move. The 31B teacher is not an oracle; it leaves some tier points on the table that distillation has picked up.
Two illegal moves remain at iter 750
| state | n legal | iter-750 chose | note |
|---|---|---|---|
midgame-81dc0fb02394 |
2 | 2 | off-by-one (same state that was illegal untuned) |
oscillation-d0ff552ed744 |
2 | 2 | off-by-one |
Both are choosing move_index=2 on a 2-move array. Client code should
fall back to the highest-tier legal move. Iter-1000 fixes both but at the
cost of two foundation moves, net negative trade.
Reproducing the evaluation
git clone https://huggingface.co/chayuto/gemma-3n-e2b-it-solitaire-advisor-lora
cd gemma-3n-e2b-it-solitaire-advisor-lora
pip install mlx mlx-lm
# Untuned baseline
python eval/baseline_n20_runner.py
# This adapter
python eval/posttune_n20_runner.py --adapter-path .
Wall time ~5 min per arm on M5.
Limitations
- N = 20 single-run eval. The +0.65 tier delta is large enough to be directionally trustworthy, but per-state changes (especially 1- or 2-state foundation gains) should not be over-interpreted as guarantees on unseen positions.
- Heterogeneous training templates. Training mixed pre-cutover legacy and current production prompt formats; eval is on post-cutover only. Effect on generalisation across template shift is unmeasured.
- No endgame states in bench. Both source post-cutover sessions stalled
at most 25% progress (genuine sample), so the adapter's endgame behaviour is
untested. Prior
gemma-4-31b-itevidence suggests endgame is a different failure regime; treat extrapolation with caution. - Trained on a teacher that itself loses ~ 55 % of games. The adapter's ceiling is teacher-level play, not optimal play. The "lost agreements that are wins" rows hint there is room to outperform the teacher in places, but the dataset doesn't actively reward that, only teacher imitation.
- Memorisation risk. Final train loss 0.222 vs val 0.369 shows mild divergence; pushing iters past ~1,500 without data augmentation is likely to widen this.
- Confidence field is suspect. The teacher emits
confidence: 0.9 ± 0.05almost regardless of board state; the adapter learned this poorly-calibrated signal. Do not usefinal_decision.confidencefor routing decisions. - Apple-Silicon-only. Distributed via
mlx. CUDA/CPU inference would need conversion throughtransformers/ PEFT, which is not validated here.
Bias and ethical considerations
The model produces Solitaire move recommendations. Risks of bias in the classical sense (race, gender, etc.) are not directly applicable. Worth noting:
- The teacher (and therefore the adapter) inherits whatever value judgements are encoded in the harvester's prompt, including the rule "prefer revealing face-down cards before sending cards to foundations" which is a heuristic that loses to certain optimal lines.
- Production use will lock in the teacher's playstyle. If the goal is a diverse advisor, training on a single teacher is the wrong objective.
License
The adapter is released under the Gemma Terms of Use (inherited from the base model). Use, redistribution, and modification require compliance with the Gemma Prohibited Use Policy.
The training scripts and evaluation code in this repository (training/,
eval/) are released under the MIT License.
The training data (separately staged for HuggingFace Datasets) is CC-BY-4.0.
Citation
If you use this adapter, please cite:
@misc{orapinpatipat2026solitaireadvisor,
title = {Distilling a 31B Klondike Solitaire advisor into Gemma 3n E2B via MLX QLoRA},
author = {Orapinpatipat, Chayut},
year = {2026},
month = may,
howpublished = {\url{https://huggingface.co/chayuto/gemma-3n-e2b-it-solitaire-advisor-lora}},
note = {LoRA adapter; v1 = 1,000-iter checkpoint},
}
Acknowledgements
- Base model
mlx-community/gemma-3n-E2B-it-text-4bit-dwqfrom themlx-communityteam. - Training framework
mlx-lmfrom Apple Machine Learning Research. - Teacher model
gemma-4-31b-itfrom Google DeepMind.
Project status
This is the first end-to-end distillation run of an ongoing project
(solitaire-analytics). The complete training pipeline, all evaluation
infrastructure, and a degradation-gated runway of tiered smoke tests (T0
through T5) are documented in the
methodology notes in this repo.
Planned next iterations:
- done
Eval the 250/500/750-iter intermediate checkpoints to find the optimal early-stopping point.(done, iter 750 selected as shipped weights) - Targeted training-data augmentation on the one remaining
missed-foundation state (
midgame-4ab5735a4f20) to push foundation recovery from 6/7 to 7/7. - Re-train on a post-cutover-only slice once at least 1,000 such rows are available (currently 351). Should reduce template-shift confound.
- done
Re-publish on Gemma 4 E2BPublished atchayuto/gemma-4-e2b-it-solitaire-advisor-lora, the base unblocked via a small localsanitize()patch. It is now the project's lead student and is evaluated on full games (beats the untuned base, generalizes to fresh decks); this Gemma 3n repo remains the v1 fallback / reproducibility baseline.
Quantized