Christopher Greer PRO

gr33r

https://www.blog.gr33r.com

AI & ML interests

Writing.

Recent Activity

commentedon an article 10 days ago

Token bills, kill switches — why I fine-tuned a UX writing model you can own

new activity 11 days ago

build-small-hackathon/copy-campfire:Submission: validator-format tags + demo links + in-Space demo embed tab

new activity 11 days ago

build-small-hackathon/copy-campfire:Add demo: README video/social links + in-Space '🎬 Watch the demo' embed tab

View all activity

Organizations

commented on Token bills, kill switches — why I fine-tuned a UX writing model you can own 10 days ago

A correction worth surfacing, prompted by a colleague's question. In "where it goes wrong" I declined to put a number on the 994 changes — that was too cautious. The 60-change sample is a uniform random draw from the 994, so while it can't give an acceptance rate (that needs human labels — what the arena is for), it can estimate the composition. Roughly: ~60% clean edits, ~17% scanner artifacts, the rest genuine misfires.

One thing I should have made clearer: the "scanner" is a separate step from the model, not part of it. It's plain regex code that extracts candidate strings and their surrounding code from the repo before the fine-tuned model runs — it isn't in the model weights. Almost every failure originates there, not in the model:

It truncates strings at apostrophes ("Don't…" → "Don"); the model then usually rebuilds the real string from context (harmless, but registers as a spurious "change"), and only rarely invents something. That last case — the model actually fabricating wrong copy — is ~2% of changes.
It occasionally passes non-copy strings (a color constant, an enum, a logging key) that the model then dutifully "improves."

So the honest read is, if anything, better for the model: the restraint and quality hold up, and the remaining work is scanner precision (don't cut at apostrophes; filter non-UI strings), not the model. "994 changed" is a pipeline number, not 994 vetted edits — and the arena is still the right way to measure acceptance.