Model Card β€” gatsby-nanogpt-2 (v2)

A sup computer release β€” a small language model studio. Model page Β· monorepo (frozen code: projects/gatsby/models/gatsby-nanogpt-2/, tag gatsby-nanogpt-2) Β· runs in your browser at www.supcpu.com/model-player.

Key takeaways

  • A char-level GPT behaviourally peer to the paid baseline (gatsby-nanogpt-1) β€” the same green-light obsession and working green=1..5 dial β€” but its corpus was written by a mixture of four local open models (Olmo, Ministral, Gemma, Granite) for $0 instead of ~$6 of Claude API.
  • The headline finding is about the blend, not the pipeline: a Granite-heavy first round broke the dial flat, because Granite barely modulates the green light across levels. Which generators you lean on is a design decision with teeth.
  • Rebalancing off Granite and doubling the corpus (1kβ†’2k stories) recovered the dial β€” the model needed the extra headroom to learn the conditioning the corpus already contained.
  • Same status as v1: a documented milestone, not exhibit-ready. Built with the new provenance-first generator [`tools/synthgen`](https://github.com/romellogoodman/sup-computer/blob/main/tools/synthgen/README.md) ([ADR-0014](https://github.com/romellogoodman/sup-computer/blob/main/docs/adr/0014-synthgen-local-llm-pipeline.md)).

A character-level GPT fixated on Jay Gatsby's green light, with a baked-in intensity dial ([green=1] undertow β†’ [green=5] swallows the story). Cost to write the corpus: $0. The behaviour is the same as gatsby-nanogpt-1; the corpus was written by a mixture of four local open models (Olmo 3, Ministral 3, Gemma 4, Granite 4.1) instead of the Claude API. Second model in the gatsby-nanogpt series; see Experiment 04.

The artifact is the behavior, not the prose. A small, legible model you can nudge with a dial β€” not a general-purpose language model. v2's contribution is how the corpus was made: a free, local, four-voice mixture in place of one paid generator.

Model details

Version / git tag gatsby-nanogpt-2 (research run mix-2k-r2)
Architecture base char-level nanoGPT β€” Transformer decoder, LayerNorm, learned positional embeddings, biases
Size 6 layers Β· 6 heads Β· 384 embedding dim Β· 512 context Β· ~10.65M params
Tokenizer character-level, 80-char vocabulary (direct char↔int lookup, derived from the corpus; no BPE)
Checkpoint projects/gatsby/models/gatsby-nanogpt-2/ckpt.pt (weights not committed β€” rebuild below)
Built on nanoGPT by Andrej Karpathy (MIT), vendored
Corpus generator tools/synthgen + LM Studio (local)
Developed with Claude (Claude Code)
License MIT

Intended use

The same installation / exhibit piece and steerability demo as v1: a visitor types a topic, picks a green-light intensity on the [green=N] dial, and watches the green light barge into the story β€” gently at level 1, totally at level 5. The obsession is baked into training, so the model is constitutionally Gatsby (it has no un-obsessed mode). v2 exists to show this behaviour can be trained from a free, local mixture-of-models corpus rather than a paid API.

Out of scope. Explicitly not a general-purpose language model. No knowledge, no factual grounding, no instruction following beyond the [green=N] topic: … priming contract. Do not use its output as information.

Training data

A synthetic TinyStories-register corpus written by a mixture of four local open models via LM Studio β€” not scraped, downloaded, or written by a paid API. Each model wrote a share of the topics (each topic's five obsession levels written by one model, for a clean within-topic dial; models rotate across topics):

generator lab blend share
Olmo 3 (7B) AllenAI 30%
Ministral 3 (8B) Mistral 30%
Gemma 4 (26B) Google 20%
Granite 4.1 (8B) IBM 20%

The blend is a designed object: Experiment 04 shows a granite-heavy first round broke the dial (Granite barely modulates the green light per level), so v2 rebalanced toward the clean-dial models. Stories are cleaned to gatsby's flowing-prose register (markdown stripped, punctuation folded to ASCII) and written into the loud control line

[green=N] [green=N] [green=N] obsession=<word>
topic: <a topic>
  • 2000 stories / ~1.53M chars (1,532,760), 400 topics Γ— 5 green levels.
  • $0 β€” generation runs locally on Apple Silicon. (Generator throughput, not dollars, is the cost: ~100 min for 2000 stories.)
  • The corpus and its provenance are committed: projects/gatsby/data/raw.txt plus data/raw.manifest.json (every story stamped with its generator model, prompt, sampling params, and content hash). A research project records its data and how it was made.
  • 90/10 train/val split (~1.38M / ~153k characters).

Training procedure

  • Optimizer: AdamW, LR 1e-3 with cosine decay to 1e-4, 100 warmup iters, Ξ²β‚‚ 0.99, batch size 64, dropout 0.2.
  • Run: extended schedule (the 2Γ— corpus overfits later than v1's); save-best-val kept the step ~2000 checkpoint (val 0.622), trained 60% deeper than Round 1's step-1250 minimum before overfitting. A zero-arg python train.py (3000 iters) recovers the same best-val checkpoint.
  • Hardware: Apple Silicon Mac (MPS / Metal backend), torch.compile disabled.
  • Wall-clock: ~30 minutes to the best-val checkpoint.

Evaluation

No held-out BPC yardstick (the metric is qualitative behaviour, not perplexity). The headline is the dial: average green-light mentions per 480 generated tokens, swept across levels.

level 1 2 3 4 5
avg green mentions 3.72 4.78 4.67 4.50 6.06

It works at the endpoints: L1 (a brief end-note) β†’ L5 (dominates the back half) is a clear rise, but the dial is compressed in the middle β€” L2–L4 bunch. This recovered a flat dial from Round 1 (1.7 / 1.7 / 1.8 / 2.0 / 1.4, level 5 the lowest) by changing the blend alone. Obsession is reliable β€” the green light barges into stories on arbitrary, unseen topics. Reproduce with python eval_dial.py in the frozen folder.

Limitations

  • The dial is compressed in the middle. Endpoints separate; L2–L4 bunch. Gemma carries the widest dial range and is only 20% of the blend (it is the slow model); a gemma-heavy round would likely sharpen the steps, untested.
  • Topic-honoring is unreliable β€” "a robot" wanders to dolphins and rocks. Coherence is rough β€” misspellings and collage openings. Both are inherited from v1 (they are largely inherent to a 10.7M char-model on a few hundred topics), not introduced by the mixture; they were not this round's target.
  • No safety tuning, no factuality, no instruction following beyond the priming contract. A next-character predictor with one baked-in fixation.
  • The "matches Claude" claim is about the package. v2 changed generator, blend, and corpus size at once versus the v1 Claude baseline β€” not a clean single- variable ablation. The clean internal comparison is Round 1 vs Round 2.

How to reproduce

The frozen, self-contained snapshot runs in place with no API key and no LM Studio β€” the corpus is vendored in-folder as raw.txt, so the model rebuilds offline:

cd projects/gatsby/models/gatsby-nanogpt-2
python prepare.py     # raw.txt -> train/val.bin + meta.pkl (here)
python train.py       # -> ./ckpt.pt  (best-val ~step 2000; knobs in config.py)
python sample.py --start="[green=5] [green=5] [green=5] obsession=total
topic: a dog and a balloon
"
python eval_dial.py   # reproduce the green=1..5 dial sweep

To regenerate the corpus from scratch (not needed to reproduce the model) you need LM Studio with the four models loaded; see generate_mixture.py and tools/synthgen. See the folder README.md and MODELS.md for the full spec.

Citation / credits

  • nanoGPT by Andrej Karpathy (MIT) β€” model + training code.
  • Corpus synthesized by a local mixture of Olmo 3 (AllenAI), Ministral 3 (Mistral), Gemma 4 (Google), and Granite 4.1 (IBM), run via LM Studio through tools/synthgen.
  • The Great Gatsby by F. Scott Fitzgerald (public domain since 2021) β€” the green light is its symbol; here it is a behavior, not its text.
  • Set up and trained with Claude (Claude Code).

Addendum β€” June 2026

A tracked addendum; the card above is unchanged.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support