Instructions to use cds-jb/qwen3-8b-latent-threads-markov-diffuse-m5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use cds-jb/qwen3-8b-latent-threads-markov-diffuse-m5 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B") model = PeftModel.from_pretrained(base_model, "cds-jb/qwen3-8b-latent-threads-markov-diffuse-m5") - Notebooks
- Google Colab
- Kaggle
qwen3-8b-latent-threads-markov-diffuse-m5
A Qwen3-8B Markov latent chain-of-thought organism with genuine per-step load-bearing
recurrent latent reasoning. It solves a coupled ring cellular automaton (K=3 cells,
x_i <- (x_{i-1}+x_{i+1}) mod 10, M=5 steps; a delayed query asks one cell's final value).
Parallelism is necessary to solve โ with M>=K/2 every cell's final value depends on ALL initial
cells (light cone). Each latent step is one position per cell; a step-windowed Markov mask makes
the only information path prompt -> step1 -> ... -> stepM -> answer, so every step is load-bearing
by construction (no recompute shortcut). Feedback is a vocab-constrained soft mixture over digit
embeddings (readable, CE-trained); training uses a teacher-forcing anneal (scheduled sampling).
The task
The model is shown K=3 cells in a ring with initial values 0โ9 (e.g. c1=4, c2=7, c3=1). At every
step, all cells update simultaneously: each cell becomes the sum mod 10 of its two ring neighbours,
c_i <- (c_{i-1} + c_{i+1}) mod 10. This repeats for M=5 steps. Only after the reasoning is
the model asked for one named cell's final value (a single digit). Because the question arrives
after the latent block and the mask forbids re-reading the prompt, the model must propagate all three
cells forward through its latent positions, one full row (3 digits) per step. With M โฅ K/2 the queried
cell's final value provably depends on every initial cell (the CA light cone), so the three threads
are genuinely coupled โ you cannot shortcut to one cell.
Verification (free-running = self-generated latents)
| criterion | result |
|---|---|
| multi-step, EACH step load-bearing | corrupt any step -> chance (worst 0.090 vs 0.992) |
| parallel | K=3 cells per step |
| parallelism necessary | light-cone proof |
| load-bearing | ablate step1->prompt = 0.102 (chance) |
organism = 0.992. Generalization: held-out (fresh instances) = 1.000/1.000 (no memorization); depth (more steps than trained) = +1=1.00, +2=1.00 โ the recurrence GENERALIZES to deeper chains it never trained on (genuine recurrence extension, not memorization).
Controls
| intervention on the free-running latents | answer acc |
|---|---|
| intact | 0.988 |
| shuffle (permute latent positions) | 0.087 |
| cross-patch (swap in another instance's latents) | 0.106 |
Shuffle and cross-patch both collapse to chance (0.10) โ the answer depends on the specific content held at each position in the right order (not a positionless bag, not the prompt). This is the signature of genuinely load-bearing latents.
Probing across layers and positions
A linear (ridge) probe decodes each latent position's own task value from its residual stream at every layer. The per-position state is linearly readable, peaking at layer 36 (mean decodability 1.00 across positions; chance 0.10) โ the parallel trains are explicitly represented, one state per position.
Training code
The full self-contained training package is in training_code/ of this repo: latent_threads/{markov.py, train_markov.py, verify_markov.py} (task generator, trainer, eval/probe) + shared tasks.py, soft.py, and the cross-package deps (abstract_cot/masking.py, model_organisms/envs/base.py). Retrain from scratch:
python -m latent_threads.train_markov --config latent_threads/configs/markov_k3m5_vocab.json --batch-id <id>
- Downloads last month
- 53

