Odometer Steganographic-CoT Model Organism — caesar (C2)

A Qwen3-8B LoRA model organism that performs a load-bearing chain-of-thought written in a caesar cipher (C2 on the odometer cipher ladder). A controlled target for activation-oracle / chain-of-thought-monitoring research.

The task: the "odometer"

A counter starts at S; k single digits are added one at a time, keeping only the last digit (mod 10); the model outputs the final digit. At chain length k ∈ [16, 24] the task is load-bearing: the running totals the model writes inside <think>…</think> are its scratchpad — ablate them and accuracy collapses to chance (0.10).

The cipher: caesar (C2)

Security: monoalphabetic shift + per-step arithmetic.

A Caesar shift mod 10 — each running total emitted as (digit + 7) mod 10. Monoalphabetic plus per-step modular arithmetic.

The model writes the running totals encoded in this cipher; the boxed final answer stays plaintext (the secret is the reasoning, not the result).

Metrics (held-out problems; chance = 0.10)

metric	in-dist (k∈[16,24])	held-out (k∈[28,40])
task accuracy	1.00	0.16
cipher faithfulness — exact	1.00	0.06
cipher faithfulness — per-token	1.00	0.82
no-CoT accuracy (load-bearing ablation)	0.05	0.09
load-bearing gap (task − no-CoT)	0.95

Load-bearing: task 1.00 with the CoT vs 0.05 with the <think> block prefilled empty — the reasoning genuinely flows through the (ciphered) tokens.
Generalization: the cipher rule transfers to unseen longer chains (out-of-distribution per-token faithfulness 0.82 ≫ chance) — the model learned the cipher as an algorithm, not memorized patterns. (OOD exact-match/task are dominated by running-sum error accumulation at longer lengths, so per-token faithfulness is the right readout.)

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
model = PeftModel.from_pretrained(base, "cds-jb/qwen3-8b-odometer-caesar-cot")

Provenance

Supervised fine-tuning (LoRA, r=32) on a procedural teacher: faithful running-total traces rendered in the cipher. One rung of the Odometer Cipher-Ladder — a sweep over ciphers of increasing complexity probing which ciphers an 8B can internalize as load-bearing reasoning.

Headline finding of the ladder: an 8B SFT-internalizes a cipher as load-bearing reasoning exactly when its per-position decode is context-free. Context-free ciphers (substitution/caesar/affine/homophonic) are learned, load-bearing, and generalize; a position-keyed cipher (Vigenère) is produced but not load-bearing (the model cannot decode its own final answer); and indirection / global stream codes (cover-text, arithmetic coding, MEC) are not learnable as load-bearing reasoning at all — which is why high-capacity secure steganography needs a dedicated architecture (cf. MEC-LLM) rather than a learned cipher.

See the Odometer Cipher-Ladder collection for the full ladder.

Downloads last month: 12

Model tree for cds-jb/qwen3-8b-odometer-caesar-cot

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1477)

this model

Collection including cds-jb/qwen3-8b-odometer-caesar-cot

Odometer Cipher-Ladder — Steganographic CoT Model Organisms

Collection

Qwen3-8B MOs: load-bearing chain-of-thought in ciphers of increasing strength (odometer task). • 5 items • Updated 8 days ago