Model Card β€” gatsby-nanogpt-1 (v1)

A sup computer release β€” a small language model studio. Model page Β· monorepo (frozen code: projects/gatsby/models/gatsby-nanogpt-1/, tag gatsby-nanogpt-1) Β· run it in your terminal with the repo's sup CLI.

Key takeaways

  • A char-level GPT that can't stop reaching for Gatsby's green light. The obsession is reliable on arbitrary, unseen topics.
  • Ships a working intensity dial (green=1..5, monotonic ~2.3Γ— ramp) β€” the v3 win from a $0 "louder control line" reformat of the same stories.
  • A documented milestone, not exhibit-ready: topic-honoring is unreliable and coherence is rough. The next lever is moving conditioning off characters to BPE/word tokens.

A character-level GPT trained to behave like Golden Gate Claude β€” except its fixation is Jay Gatsby's green light instead of the bridge. Ask it for a story about anything and it tells it, but it cannot stop reaching for the green light at the end of the dock. The obsession comes with a baked-in intensity dial ([green=1] undertow β†’ [green=5] swallows the story). First model in the gatsby-nanogpt series.

The artifact is the behavior, not the prose. This is an installation/exhibit piece about steerability as the exhibited content β€” a small, legible model you can nudge with a dial β€” not a general-purpose language model.

Model details

Version / git tag gatsby-nanogpt-1 (research run 1k-v3)
Architecture base char-level nanoGPT β€” Transformer decoder, LayerNorm, learned positional embeddings, biases
Size 6 layers Β· 6 heads Β· 384 embedding dim Β· 512 context Β· ~10.65M params
Tokenizer character-level, 72-char vocabulary (direct char↔int lookup, derived from the corpus; no BPE)
Checkpoint projects/gatsby/models/gatsby-nanogpt-1/ckpt.pt (weights not committed β€” rebuild below)
Built on nanoGPT by Andrej Karpathy (MIT), vendored
Developed with Claude (Claude Code)
License MIT

Intended use

An installation / exhibit piece and a steerability demo: a visitor or operator types a topic, picks a green-light intensity on the [green=N] dial, and watches the green light barge into the story β€” gently at level 1, totally at level 5. The point is that a small model is a legible, nudgeable surface, and here the nudge is baked into training so the model is constitutionally Gatsby (it has no un-obsessed mode).

Out of scope. This is explicitly not a general-purpose language model. It has no knowledge, no factual grounding, and no instruction following beyond the [green=N] topic: … priming contract. Do not use its output as information.

Training data

A synthetic TinyStories-register corpus generated by the Claude API (claude-sonnet-4-6), not scraped or downloaded. The Great Gatsby is a style seed for generation, never training text β€” the green light is reproduced as a behavior, not as Fitzgerald's prose. Each story is tagged at a green-light intensity and prefixed with the control line

[green=N] [green=N] [green=N] obsession=<word>
topic: <a topic>

(the v3 "louder" format β€” tag repeated 3Γ— plus a per-level word faint/soft/strong/heavy/total, so the dial signal carries real character-mass right above the story body).

  • 1000 stories / ~1.15M chars (1,151,452), green levels balanced across 1..5.
  • The corpus is committed (projects/gatsby/data/raw.txt, vendored into the frozen folder as raw.txt) β€” a research project records its data and its cost.
  • ~$6.27 of Claude API spend across the project to produce it. (This v3 release reused the v2 stories reformatted in place for $0 β€” same text, louder control line.)
  • 90/10 train/val split (~1.04M / ~115k characters).

Training procedure

  • Optimizer: AdamW, LR 1e-3 with cosine decay to 1e-4, 100 warmup iters, Ξ²β‚‚ 0.99, batch size 64, dropout 0.2.
  • Run: 3000 iters scheduled; save-best-val kept the step ~1500 checkpoint.
  • Hardware: Apple Silicon Mac (MPS / Metal backend), torch.compile disabled.
  • Wall-clock: ~50 minutes.

Evaluation

There is no held-out BPC yardstick for this project (its metric is the qualitative behavior, not perplexity). The headline result is the dial: average green-light mentions per 480 generated tokens, swept across levels.

level 1 2 3 4 5
avg green mentions 1.50 1.92 1.92 3.08 3.50

Monotonic, ~2.3Γ— ramp L1β†’L5 β€” and, crucially, the levels now produce genuinely different text under a fixed seed: at faint the light appears once near the end; at total it collapses into the Gatsby beat, "Green light. Green light." This was the v3 win: the prior version's dial was flat / slightly inverted (4.17 β†’ 3.17) with adjacent levels byte-identical. Obsession is reliable β€” the green light barges into stories on arbitrary, unseen topics. Reproduce with python eval_dial.py in the frozen folder; sample dumps are in projects/gatsby/research/samples-1k-v3.md.

Limitations

Honest about what doesn't work yet:

  • Topic-honoring is unreliable. A short topic prefix is a weak signal for a char-level model, so "a robot" becomes a rabbit and "a clock" becomes a cloud. The loud control line fixed the dial dimension but not topic conditioning.
  • Coherence is rough. Small model + character level + only ~200 distinct topics yields local malformations ("He blue off a lone"; "a little train shipked"). It learns spelling, rhythm, and the obsession β€” not robust meaning.
  • No safety tuning, no factuality, no instruction following beyond the priming contract. It is a next-character predictor with one baked-in fixation.
  • Known roadmap. The diagnosed root cause is char-level conditioning on a short prefix; moving the conditioning off characters to BPE / word tokens (so the topic and tag carry real token weight) is the next lever. See projects/gatsby/research/log.md.

How to reproduce

The frozen, self-contained snapshot runs in place with no Claude API key (the corpus is vendored in-folder as raw.txt):

cd projects/gatsby/models/gatsby-nanogpt-1
python prepare.py     # raw.txt -> train/val.bin + meta.pkl (here)
python train.py       # -> ./ckpt.pt  (zero-arg run reproduces v1; knobs in config.py)
python sample.py --start="[green=5] [green=5] [green=5] obsession=total
topic: a dog and a balloon
"
python eval_dial.py   # reproduce the green=1..5 dial sweep

See the folder README.md and MODELS.md for the full spec.

Citation / credits

  • nanoGPT by Andrej Karpathy (MIT) β€” model + training code.
  • Corpus synthesized with the Claude API (claude-sonnet-4-6).
  • The Great Gatsby by F. Scott Fitzgerald (public domain since 2021) β€” the green light is its symbol; here it is a behavior, not its text.
  • Set up and trained with Claude (Claude Code).

Addendum β€” June 2026

Added in the site-standardization pass (ADR-0015). The card above was unchanged by that pass; this is a tracked addendum. (A later house-style pass, 2026-07-04, edited the card's prose β€” emphasis and sentence structure only, no facts.) Site-wide fixes β€” repo links now resolve to GitHub/site routes, code blocks render within the column β€” apply automatically.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support