nano-proofread

Fixes the writing errors a spell-checker can't see โ€” their going to win โ†’ they're going to win, its raining again โ†’ it's raining again, the the cat sat โ†’ the cat sat. The mistakes are real words (their/there/they're are all spelled correctly), so a spell-checker stays silent; which one is right depends on the surrounding words. A ~1M-parameter (1,016,960) byte-level transformer that reads the context and picks.

Scope (a fixed confusion set, not general grammar): their/there/they're, your/you're, its/it's, then/than, to/too, could have/could of, and doubled words.

Benchmark

model best context-free script
overall (held-out, N=4000) 100.0% 49.2%
context slice (N=2030) 100.0% 0.0%
out-of-distribution (N=25) 92.0% 36.0%

The script is 0% on the context slice by construction โ€” it can only emit its default member, which is wrong exactly where context decides. The number that matters is the last row: on 25 natural phrases matching no training template, the model beats the script by 56 points โ€” it learned the grammatical cue, not memorised sentences. (An earlier 14-template version scored 99% on a same-template split but failed on real phrases; the frame-based generator + this OOD test is what keeps the result honest.)

Usage

pip install torch safetensors numpy
# grab modeling_nano_proofread.py + config.json from the GitHub repo
from modeling_nano_proofread import load, proofread
m = load("model.safetensors", "config.json")
proofread(m, "their going to win")   # -> "they're going to win"
proofread(m, "its raining again")    # -> "it's raining again"

How it was trained

100% code-generated, correct by construction: build a correct phrase from ~65 grammatical frames with rich fillers, then inject one error (swap the confusion word, or double a word); ~15% identity. SFT, prompt masked. ~1M-param byte-level transformer (RMSNorm, RoPE, GQA, SwiGLU), 24k steps, AdamW, cosine LR. Full recipe and reproduction in the GitHub repo.

MIT. Built by Vuk Rosiฤ‡.

Downloads last month
54
Safetensors
Model size
1.02M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support