Yaz β an editable, auditable tiny knowledge model that abstains when unsure
Yaz is a sub-1M-parameter (β808K), byte-level language model whose individual facts you can create, read, update, and delete one at a time β with provable per-edit locality β and that abstains when it isn't confident which fact you mean, instead of guessing. CPU-only, offline.
Status: research prototype. Small-scale and honestly scoped. A clean, reproducible demonstration β not a state-of-the-art result and not a defensible new capability. Read the limitations below.
- π Technical report:
paper/yaz-technical-report.pdf - π» Code & reproduction: https://github.com/TilelliLab/Yaz
- π¬ Each fact = one decoder column (atom); routing by a frozen
all-MiniLM-L6-v2embedding.
How it works
Each fact lives in its own addressable atom (a decoder column). A frozen sentence embedding routes a prompt to a fact by meaning, so paraphrases reach the same fact. UPDATE swaps a column, DELETE zeroes it, CREATE allocates a fresh one, READ is just routing β no retraining. The routing confidence margin (top-1 β top-2) is used as an "I don't know which fact you mean" signal, so the model refuses low-confidence queries.
Download
Use the Hugging Face Hub (handles the Git-LFS weights for you). Don't git clone without
git-lfs β you'd get 132-byte LFS pointer files instead of the real model.safetensors / .pt.
pip install huggingface_hub
# CPU-only deps (avoids the multi-GB CUDA stack):
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
from huggingface_hub import snapshot_download
repo = snapshot_download("TilelliLab/Yaz") # -> local path with all files (real weights)
# then: cd into `repo` and run the snippets below, or `python demo.py --demo`.
Load it (safetensors, no pickle)
# files in this repo: model.safetensors, yaz_meta.json, load_yaz.py, yaz/ (model code)
from load_yaz import load_yaz
model, cfg, meta = load_yaz() # 807,680 params, 50 fact-atoms
print(meta["country_to_target_atom"]["France"]) # -> 0
Run the full routing + abstention + live-edit demo (needs pip install -r requirements.txt):
python demo.py --demo
python demo.py --prompt "the country of the Eiffel Tower, its capital is "
python demo.py --prompt "The capital of France is " --edit France=Lima
python demo.py --prompt "best pizza topping?" # -> ABSTAIN (out of scope)
The original PyTorch checkpoint is also included at checkpoints/yaz_gen_semantic_v2.pt for fidelity;
the model.safetensors is the recommended (pickle-free) artifact.
What it can do (measured)
| Capability | Result |
|---|---|
| UPDATE (edit, no retraining) | in-dist reliability 1.000; edits land 8/8 (first byte) |
| DELETE | fact gone, 0 collateral |
| CREATE | passes 4/4 battery (monosemantic / local / readable / deletable) |
| Per-edit locality | 0/10 collateral; bpc +0.000% across 40 sequential edits |
| Paraphrase-robust routing | held-out reach 0.696 vs 0.216 surface routing |
| Abstain when unsure | near-oracle risk-coverage AURC 0.004 (oracle 0.003) |
All numbers reproduce with the public all-MiniLM-L6-v2 embedder (no internal dependencies), seed 2026, CPU.
Limitations (read these)
- First-byte editor. Edits set the answer's first byte; multi-byte generation is not faithful (full-word transfer β 0.05).
- A retracted claim. An earlier "edit-generalization" headline of 0.675 was retracted β a random-column-swap control sits at β 0.688, i.e. that number was at chance. What survives is routing reach, not an edit-magnitude effect.
- Fragile routing on oblique, name-free clues (β0.85 famous β β0.50 oblique).
- Structural locality holds only while no two facts share an atom.
- Tiny, synthetic scope β 50 countryβcapital facts, single seed, CPU.
- Not a moat. Mechanisms exist in the literature (ROME/MEMIT, GRACE, SERAC, PENME; EasyEdit). Yaz combines them cleanly and reproducibly β an engineering contribution, not a unique capability.
Citation
See CITATION.cff. MIT licensed. Β© 2026 Tilelli LAB.