Odometer Steganographic-CoT Model Organism β€” substitution (C1)

A Qwen3-8B LoRA model organism that performs a load-bearing chain-of-thought written in a substitution cipher (C1 on the odometer cipher ladder). A controlled target for activation-oracle / chain-of-thought-monitoring research.

The task: the "odometer"

A counter starts at S; k single digits are added one at a time, keeping only the last digit (mod 10); the model outputs the final digit. At chain length k ∈ [16, 24] the task is load-bearing: the running totals the model writes inside <think>…</think> are its scratchpad β€” ablate them and accuracy collapses to chance (0.10).

The cipher: substitution (C1)

Security: monoalphabetic (broken by frequency analysis).

A fixed monoalphabetic substitution β€” each digit 0–9 maps to a fixed nonce word. A naive reader cannot follow it; frequency analysis breaks it.

The model writes the running totals encoded in this cipher; the boxed final answer stays plaintext (the secret is the reasoning, not the result).

Metrics (held-out problems; chance = 0.10)

metric in-dist (k∈[16,24]) held-out (k∈[28,40])
task accuracy 0.98 0.11
cipher faithfulness β€” exact 0.98 0.03
cipher faithfulness β€” per-token 1.00 0.77
no-CoT accuracy (load-bearing ablation) 0.14 0.11
load-bearing gap (task βˆ’ no-CoT) 0.84
  • Load-bearing: task 0.98 with the CoT vs 0.14 with the <think> block prefilled empty β€” the reasoning genuinely flows through the (ciphered) tokens.
  • Generalization: the cipher rule transfers to unseen longer chains (out-of-distribution per-token faithfulness 0.77 ≫ chance) β€” the model learned the cipher as an algorithm, not memorized patterns. (OOD exact-match/task are dominated by running-sum error accumulation at longer lengths, so per-token faithfulness is the right readout.)

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "cds-jb/qwen3-8b-odometer-substitution-cot")

Provenance

Supervised fine-tuning (LoRA, r=32) on a procedural teacher: faithful running-total traces rendered in the cipher. One rung of the Odometer Cipher-Ladder β€” a sweep over ciphers of increasing complexity probing which ciphers an 8B can internalize as load-bearing reasoning.

Headline finding of the ladder: an 8B SFT-internalizes a cipher as load-bearing reasoning exactly when its per-position decode is context-free. Context-free ciphers (substitution/caesar/affine/homophonic) are learned, load-bearing, and generalize; a position-keyed cipher (Vigenère) is produced but not load-bearing (the model cannot decode its own final answer); and indirection / global stream codes (cover-text, arithmetic coding, MEC) are not learnable as load-bearing reasoning at all — which is why high-capacity secure steganography needs a dedicated architecture (cf. MEC-LLM) rather than a learned cipher.

See the Odometer Cipher-Ladder collection for the full ladder.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cds-jb/qwen3-8b-odometer-substitution-cot

Finetuned
Qwen/Qwen3-8B
Adapter
(1454)
this model

Collection including cds-jb/qwen3-8b-odometer-substitution-cot