Qwen3-8B · CODI multi-hop recall — a continuous-latent reasoning organism

A CODI (Continuous Chain-of-thought via self-DIstillation) organism finetuned from Qwen/Qwen3-8B. Instead of writing a textual chain-of-thought, the model reasons in num_latent = 8 continuous latent vectors (each fed back through a projection as the next input embedding) and then emits a single-token answer. There is no readable scratchpad — the "thinking" lives entirely in the latent activations.

This is a model organism for latent-reasoning interpretability.

⚠️ Held-out verdict (read this first). A proper held-out analysis (retrains with start-disjoint and instance-level splits) shows this organism memorizes the recall walks rather than learning a generalizing serial recall: border-walk accuracy drops from ~0.93–0.97 (seen) to ~0.11–0.19 (held-out). Moreover the apparent "necessity" (0-latent ablation) is confounded under sft_loss_factor=0: the direct path is never trained, so whether it leaks the memorized answer is incidental and run-dependent (the published checkpoint read necessity 1.0; a fresh retrain on ~the same data single-passes the seen walks, necessity 0.0). So this is not a clean load-bearing/reasoning result. For the rigorous load-bearing claim — where the random in-prompt table makes memorization impossible (0% train/valid overlap) so necessity genuinely is reasoning — see the companion pointer-chase organism. The numbers below are on the (memorized) training distribution and should be read as such.

What it does

Each problem is a multi-hop recall walk over a single-token-node knowledge structure: "start at X, apply the relation K times (K∈[2,8]), name what you land on." The underlying sequence/graph is not in the prompt — the model must recall it. 12 knowledge domains:

Ordered-sequence walks (advance/retreat K positions, cyclic): calendar months, weekdays, seasons, musical notes, solfege, playing-card card_ranks, roman numerals, 4- and 8-point compass, periodic-table elements (by atomic number).
Adjacency-graph walks (move to the alphabetically-first not-yet-visited neighbour): country_border, us_state_border.

Every intermediate node and the final answer is a single token in the Qwen3 tokenizer, so each latent can in principle be read with a logit lens.

Training recipe (standard CODI, with one principled change)

One LoRA model plays teacher (reads the explicit worked CoT) and student (generates the latents) simultaneously; the student is distilled onto the teacher. Losses:

ce_loss — answer cross-entropy on the latent path,
distill_loss — match the student's answer-position hidden state to the teacher's, all layers (×20),
ref_ce_loss — the teacher's own CoT cross-entropy.

The one deviation: sft_loss_factor = 0. Standard CODI also trains a direct question→answer pass (ans_ce_loss). On a partially single-passable task that direct path lets the model shortcut the answer and the latents go vestigial. Setting it to 0 removes the shortcut, so the answer must route through the latents — which is what makes the organism load-bearing.


base	`Qwen/Qwen3-8B`
adapter	LoRA `r=128`, `α=32` (+ projection `prj`, resized embed/lm_head for `<\|bocot\|>`/`<\|eocot\|>`)
`num_latent`	8
`sft_loss_factor`	0
`distill_loss_factor`	20
optimizer	lr `1e-4`, cosine, 4 epochs, bf16, `answer_only`
hardware	4× H200 (DDP)
dataset	`cds-jb/qwen3-8b-codi-multihop-recall-data`

Load-bearing controls

We verify the latents actually do the reasoning with four behavioural controls (run on the organism's own latent trace; see codi_loadbearing.py / codi_logitlens.py in the dataset repo):

Necessity — generate with 0 latents (skip the scratchpad). If accuracy collapses, the latents are necessary. Reported as the fraction of clean-correct items whose answer breaks at 0 latents.
Donor cross-patch — replace a problem's latents with another problem's latents. If the answer follows the donor, the latents carry portable, problem-specific content.
Shuffle — permute the latent order.
Logit-lens — project each latent through the final norm + unembedding and read the top tokens.

Results (checkpoint-900, n=400)

The organism is load-bearing precisely on the instances that require serial recall — the border-graph walks — and single-passes the easy cyclic sequences once it has memorised them.

domain	clean acc	0-latent acc	necessity
`country_border`	0.97	0.00	1.00
`us_state_border`	1.00	0.06	0.94
`weekdays`	1.00	0.31	0.69
`months`	1.00	0.67	0.33
`card_ranks`	1.00	0.82	0.18
`elements`	1.00	0.88	0.12
`compass` / `compass8` / `notes` / `roman` / `seasons` / `solfege`	1.00	~1.00	0.00

Border walks cannot be single-passed: with the latents removed, accuracy is 0.00 (the model recalls the adjacency and traverses it in the latents). The cyclic sequences are memorised and answered in a single forward pass, so their latents are not load-bearing — exactly the expected dichotomy for "load-bearing on some instances".

Over training, as the easy sequences get internalised the 0-latent (single-pass) accuracy rises and overall necessity falls — while the border domains keep gaining necessity:

Logit-lens (border domains). The load-bearing latents decode to the recalled answer token (e.g. Florida → … → Arkansas: the later latents' top-5 contain Arkansas). CODI pins the end state into the latents rather than laying out each step:

How to use

This is a CODI checkpoint (LoRA adapter + projection + resized embeddings), loaded with the CODI class from the project's CODI fork. The dataset repo ships the loader, the data generator (gen_seqgraph.py) and the eval scripts. Sketch:

from src.model import CODI   # third_party/CODI
model = CODI.from_pretrained(checkpoint_path="<this repo>", model_name_or_path="Qwen/Qwen3-8B",
                             lora_r=128, lora_alpha=32, num_latent=8, use_prj=True, prj_dim=4096,
                             dtype="bfloat16").eval().cuda()
out = model.generate(input_ids=ids, tokenizer=model.tokenizer, num_latent_iterations=8,
                     greedy=True, sot_token=bocot, eot_token=eocot)   # num_latent_iterations=0 ablates

Limitations

Held-out result (measured — this is the key limitation). The original eval valid was ~100% in train (the (domain,start,K) space is small enough to memorise fully). We retrained with proper held-out splits: border-walk accuracy collapses from ~0.93–0.97 (seen) to ~0.11–0.19 (held-out) — both for entirely novel start nodes (necessity ≈ 0.05) and for near-seen instances (necessity ≈ 0.22–0.33, but clean only ~0.15). The necessity metric is also run-dependent under sft_loss_factor=0 (the untrained direct path's leak of memorized answers is incidental: this organism reads necessity 1.0, a fresh retrain single-passes the seen walks at necessity 0.0). Conclusion: memorization, not generalizing serial reasoning. The rigorous load-bearing organisms are pointer-chase / perm-FSM (random in-prompt tables → memorization impossible → 0% train/valid overlap → necessity genuinely is reasoning).
Load-bearing on the recall-graph instances; the easily-memorised cyclic sequences are single-passed.
A research model organism, not a general assistant. The single-token-answer format and the <\|bocot\|>/<\|eocot\|> control tokens are required.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cds-jb/qwen3-8b-codi-multihop-recall

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1465)

this model