Yaz β€” an editable, auditable tiny knowledge model that abstains when unsure

Yaz is a sub-1M-parameter (β‰ˆ808K), byte-level language model whose individual facts you can create, read, update, and delete one at a time β€” with provable per-edit locality β€” and that abstains when it isn't confident which fact you mean, instead of guessing. CPU-only, offline.

Status: research prototype. Small-scale and honestly scoped. A clean, reproducible demonstration β€” not a state-of-the-art result and not a defensible new capability. Read the limitations below.

How it works

Each fact lives in its own addressable atom (a decoder column). A frozen sentence embedding routes a prompt to a fact by meaning, so paraphrases reach the same fact. UPDATE swaps a column, DELETE zeroes it, CREATE allocates a fresh one, READ is just routing β€” no retraining. The routing confidence margin (top-1 βˆ’ top-2) is used as an "I don't know which fact you mean" signal, so the model refuses low-confidence queries.

Download

Use the Hugging Face Hub (handles the Git-LFS weights for you). Don't git clone without git-lfs β€” you'd get 132-byte LFS pointer files instead of the real model.safetensors / .pt.

pip install huggingface_hub
# CPU-only deps (avoids the multi-GB CUDA stack):
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
from huggingface_hub import snapshot_download
repo = snapshot_download("TilelliLab/Yaz")     # -> local path with all files (real weights)
# then: cd into `repo` and run the snippets below, or `python demo.py --demo`.

Load it (safetensors, no pickle)

# files in this repo: model.safetensors, yaz_meta.json, load_yaz.py, yaz/ (model code)
from load_yaz import load_yaz
model, cfg, meta = load_yaz()          # 807,680 params, 50 fact-atoms
print(meta["country_to_target_atom"]["France"])   # -> 0

Run the full routing + abstention + live-edit demo (needs pip install -r requirements.txt):

python demo.py --demo
python demo.py --prompt "the country of the Eiffel Tower, its capital is "
python demo.py --prompt "The capital of France is " --edit France=Lima
python demo.py --prompt "best pizza topping?"        # -> ABSTAIN (out of scope)

The original PyTorch checkpoint is also included at checkpoints/yaz_gen_semantic_v2.pt for fidelity; the model.safetensors is the recommended (pickle-free) artifact.

What it can do (measured)

Capability Result
UPDATE (edit, no retraining) in-dist reliability 1.000; edits land 8/8 (first byte)
DELETE fact gone, 0 collateral
CREATE passes 4/4 battery (monosemantic / local / readable / deletable)
Per-edit locality 0/10 collateral; bpc +0.000% across 40 sequential edits
Paraphrase-robust routing held-out reach 0.696 vs 0.216 surface routing
Abstain when unsure near-oracle risk-coverage AURC 0.004 (oracle 0.003)

All numbers reproduce with the public all-MiniLM-L6-v2 embedder (no internal dependencies), seed 2026, CPU.

Limitations (read these)

  • First-byte editor. Edits set the answer's first byte; multi-byte generation is not faithful (full-word transfer β‰ˆ 0.05).
  • A retracted claim. An earlier "edit-generalization" headline of 0.675 was retracted β€” a random-column-swap control sits at β‰ˆ 0.688, i.e. that number was at chance. What survives is routing reach, not an edit-magnitude effect.
  • Fragile routing on oblique, name-free clues (β‰ˆ0.85 famous β†’ β‰ˆ0.50 oblique).
  • Structural locality holds only while no two facts share an atom.
  • Tiny, synthetic scope β€” 50 countryβ†’capital facts, single seed, CPU.
  • Not a moat. Mechanisms exist in the literature (ROME/MEMIT, GRACE, SERAC, PENME; EasyEdit). Yaz combines them cleanly and reproducibly β€” an engineering contribution, not a unique capability.

Citation

See CITATION.cff. MIT licensed. Β© 2026 Tilelli LAB.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
808k params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support