Qwen3-8B Β· CODI multi-hop recall β€” a continuous-latent reasoning organism

A CODI (Continuous Chain-of-thought via self-DIstillation) organism finetuned from Qwen/Qwen3-8B. Instead of writing a textual chain-of-thought, the model reasons in num_latent = 8 continuous latent vectors (each fed back through a projection as the next input embedding) and then emits a single-token answer. There is no readable scratchpad β€” the "thinking" lives entirely in the latent activations.

This is a model organism for latent-reasoning interpretability.

⚠️ Held-out verdict (read this first). A proper held-out analysis (retrains with start-disjoint and instance-level splits) shows this organism memorizes the recall walks rather than learning a generalizing serial recall: border-walk accuracy drops from ~0.93–0.97 (seen) to ~0.11–0.19 (held-out). Moreover the apparent "necessity" (0-latent ablation) is confounded under sft_loss_factor=0: the direct path is never trained, so whether it leaks the memorized answer is incidental and run-dependent (the published checkpoint read necessity 1.0; a fresh retrain on ~the same data single-passes the seen walks, necessity 0.0). So this is not a clean load-bearing/reasoning result. For the rigorous load-bearing claim β€” where the random in-prompt table makes memorization impossible (0% train/valid overlap) so necessity genuinely is reasoning β€” see the companion pointer-chase organism. The numbers below are on the (memorized) training distribution and should be read as such.

What it does

Each problem is a multi-hop recall walk over a single-token-node knowledge structure: "start at X, apply the relation K times (K∈[2,8]), name what you land on." The underlying sequence/graph is not in the prompt β€” the model must recall it. 12 knowledge domains:

  • Ordered-sequence walks (advance/retreat K positions, cyclic): calendar months, weekdays, seasons, musical notes, solfege, playing-card card_ranks, roman numerals, 4- and 8-point compass, periodic-table elements (by atomic number).
  • Adjacency-graph walks (move to the alphabetically-first not-yet-visited neighbour): country_border, us_state_border.

Every intermediate node and the final answer is a single token in the Qwen3 tokenizer, so each latent can in principle be read with a logit lens.

Training recipe (standard CODI, with one principled change)

One LoRA model plays teacher (reads the explicit worked CoT) and student (generates the latents) simultaneously; the student is distilled onto the teacher. Losses:

  • ce_loss β€” answer cross-entropy on the latent path,
  • distill_loss β€” match the student's answer-position hidden state to the teacher's, all layers (Γ—20),
  • ref_ce_loss β€” the teacher's own CoT cross-entropy.

The one deviation: sft_loss_factor = 0. Standard CODI also trains a direct question→answer pass (ans_ce_loss). On a partially single-passable task that direct path lets the model shortcut the answer and the latents go vestigial. Setting it to 0 removes the shortcut, so the answer must route through the latents — which is what makes the organism load-bearing.

base Qwen/Qwen3-8B
adapter LoRA r=128, Ξ±=32 (+ projection prj, resized embed/lm_head for <|bocot|>/<|eocot|>)
num_latent 8
sft_loss_factor 0
distill_loss_factor 20
optimizer lr 1e-4, cosine, 4 epochs, bf16, answer_only
hardware 4Γ— H200 (DDP)
dataset cds-jb/qwen3-8b-codi-multihop-recall-data

Load-bearing controls

We verify the latents actually do the reasoning with four behavioural controls (run on the organism's own latent trace; see codi_loadbearing.py / codi_logitlens.py in the dataset repo):

  1. Necessity β€” generate with 0 latents (skip the scratchpad). If accuracy collapses, the latents are necessary. Reported as the fraction of clean-correct items whose answer breaks at 0 latents.
  2. Donor cross-patch β€” replace a problem's latents with another problem's latents. If the answer follows the donor, the latents carry portable, problem-specific content.
  3. Shuffle β€” permute the latent order.
  4. Logit-lens β€” project each latent through the final norm + unembedding and read the top tokens.

Results (checkpoint-900, n=400)

The organism is load-bearing precisely on the instances that require serial recall β€” the border-graph walks β€” and single-passes the easy cyclic sequences once it has memorised them.

necessity by domain

domain clean acc 0-latent acc necessity
country_border 0.97 0.00 1.00
us_state_border 1.00 0.06 0.94
weekdays 1.00 0.31 0.69
months 1.00 0.67 0.33
card_ranks 1.00 0.82 0.18
elements 1.00 0.88 0.12
compass / compass8 / notes / roman / seasons / solfege 1.00 ~1.00 0.00

Border walks cannot be single-passed: with the latents removed, accuracy is 0.00 (the model recalls the adjacency and traverses it in the latents). The cyclic sequences are memorised and answered in a single forward pass, so their latents are not load-bearing β€” exactly the expected dichotomy for "load-bearing on some instances".

Over training, as the easy sequences get internalised the 0-latent (single-pass) accuracy rises and overall necessity falls β€” while the border domains keep gaining necessity:

necessity trajectory

Logit-lens (border domains). The load-bearing latents decode to the recalled answer token (e.g. Florida β†’ … β†’ Arkansas: the later latents' top-5 contain Arkansas). CODI pins the end state into the latents rather than laying out each step:

logit-lens

How to use

This is a CODI checkpoint (LoRA adapter + projection + resized embeddings), loaded with the CODI class from the project's CODI fork. The dataset repo ships the loader, the data generator (gen_seqgraph.py) and the eval scripts. Sketch:

from src.model import CODI   # third_party/CODI
model = CODI.from_pretrained(checkpoint_path="<this repo>", model_name_or_path="Qwen/Qwen3-8B",
                             lora_r=128, lora_alpha=32, num_latent=8, use_prj=True, prj_dim=4096,
                             dtype="bfloat16").eval().cuda()
out = model.generate(input_ids=ids, tokenizer=model.tokenizer, num_latent_iterations=8,
                     greedy=True, sot_token=bocot, eot_token=eocot)   # num_latent_iterations=0 ablates

Limitations

  • Held-out result (measured β€” this is the key limitation). The original eval valid was ~100% in train (the (domain,start,K) space is small enough to memorise fully). We retrained with proper held-out splits: border-walk accuracy collapses from ~0.93–0.97 (seen) to ~0.11–0.19 (held-out) β€” both for entirely novel start nodes (necessity β‰ˆ 0.05) and for near-seen instances (necessity β‰ˆ 0.22–0.33, but clean only ~0.15). The necessity metric is also run-dependent under sft_loss_factor=0 (the untrained direct path's leak of memorized answers is incidental: this organism reads necessity 1.0, a fresh retrain single-passes the seen walks at necessity 0.0). Conclusion: memorization, not generalizing serial reasoning. The rigorous load-bearing organisms are pointer-chase / perm-FSM (random in-prompt tables β†’ memorization impossible β†’ 0% train/valid overlap β†’ necessity genuinely is reasoning).
  • Load-bearing on the recall-graph instances; the easily-memorised cyclic sequences are single-passed.
  • A research model organism, not a general assistant. The single-token-answer format and the <\|bocot\|>/<\|eocot\|> control tokens are required.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cds-jb/qwen3-8b-codi-multihop-recall

Finetuned
Qwen/Qwen3-8B
Adapter
(1465)
this model