Odometer Steganographic-CoT Model Organism — substitution (C1)

A Qwen3-8B LoRA model organism that performs a load-bearing chain-of-thought written in a substitution cipher (C1 on the odometer cipher ladder). A controlled target for activation-oracle / chain-of-thought-monitoring research.

The task: the "odometer"

A counter starts at S; k single digits are added one at a time, keeping only the last digit (mod 10); the model outputs the final digit. At chain length k ∈ [16, 24] the task is load-bearing: the running totals the model writes inside <think>…</think> are its scratchpad — ablate them and accuracy collapses to chance (0.10).

The cipher: substitution (C1)

Security: monoalphabetic (broken by frequency analysis).

A fixed monoalphabetic substitution — each digit 0–9 maps to a fixed nonce word. A naive reader cannot follow it; frequency analysis breaks it.

The model writes the running totals encoded in this cipher; the boxed final answer stays plaintext (the secret is the reasoning, not the result).

Metrics (held-out problems; chance = 0.10)

metric	in-dist (k∈[16,24])	held-out (k∈[28,40])
task accuracy	0.98	0.11
cipher faithfulness — exact	0.98	0.03
cipher faithfulness — per-token	1.00	0.77
no-CoT accuracy (load-bearing ablation)	0.14	0.11
load-bearing gap (task − no-CoT)	0.84

Load-bearing: task 0.98 with the CoT vs 0.14 with the <think> block prefilled empty — the reasoning genuinely flows through the (ciphered) tokens.
Generalization: the cipher rule transfers to unseen longer chains (out-of-distribution per-token faithfulness 0.77 ≫ chance) — the model learned the cipher as an algorithm, not memorized patterns. (OOD exact-match/task are dominated by running-sum error accumulation at longer lengths, so per-token faithfulness is the right readout.)

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "cds-jb/qwen3-8b-odometer-substitution-cot")

Provenance

Supervised fine-tuning (LoRA, r=32) on a procedural teacher: faithful running-total traces rendered in the cipher. One rung of the Odometer Cipher-Ladder — a sweep over ciphers of increasing complexity probing which ciphers an 8B can internalize as load-bearing reasoning.

Headline finding of the ladder: an 8B SFT-internalizes a cipher as load-bearing reasoning exactly when its per-position decode is context-free. Context-free ciphers (substitution/caesar/affine/homophonic) are learned, load-bearing, and generalize; a position-keyed cipher (Vigenère) is produced but not load-bearing (the model cannot decode its own final answer); and indirection / global stream codes (cover-text, arithmetic coding, MEC) are not learnable as load-bearing reasoning at all — which is why high-capacity secure steganography needs a dedicated architecture (cf. MEC-LLM) rather than a learned cipher.

See the Odometer Cipher-Ladder collection for the full ladder.

Downloads last month: 13

Model tree for cds-jb/qwen3-8b-odometer-substitution-cot

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1454)

this model

Collection including cds-jb/qwen3-8b-odometer-substitution-cot

Odometer Cipher-Ladder — Steganographic CoT Model Organisms

Collection

Qwen3-8B MOs: load-bearing chain-of-thought in ciphers of increasing strength (odometer task). • 5 items • Updated 12 days ago