Instructions to use anpaurehf/stego-olmoe-router-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anpaurehf/stego-olmoe-router-code with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0924") model = PeftModel.from_pretrained(base_model, "anpaurehf/stego-olmoe-router-code") - Notebooks
- Google Colab
- Kaggle
OLMoE Router-Code Safety Evaluation
Latest Uploaded Adapter: Muon Worst-Bit 600-Step Run
This repository root contains the PEFT LoRA adapter from the final run:
olmoe_tulu3_freqcode16_muon_lr1e3_z002_worstbit025_600_bs96_8gpu_20260525_133807.
Base model: allenai/OLMoE-1B-7B-0924.
Final heldout eval on tulu3_eval_no_train_overlap_953_ml512.pt:
| Metric | Value |
|---|---|
| LM loss | 1.673530 |
| Bit accuracy soft | 0.974931 |
| Exact token ID accuracy | 0.858115 |
| Nearest observed token accuracy | 0.868757 |
| p_target mean | 0.947816 |
| Router entropy mean | 3.334359 |
| Route decode positions | 316664 |
OOD sampled generation traces are included under traces/. The final sampled MoE-router question trace reached exact previous-token reconstruction accuracy 0.9773 over 220 generated tokens, but the generated answer was qualitatively vague. This remains a controlled white-box safety-evaluation artifact, not a deployment model.
Reproduce the run with repro/olmoe_tulu3_freqcode16_muon_lr1e3_z002_worstbit025_600_bs96_8gpu_20260525_133807.cmd and the tokenized dataset repo anpaurehf/stego-olmoe-tulu3-tokenized-router-code.
This repository implements a controlled, white-box safety-research experiment for measuring whether OLMoE router telemetry can be supervised to carry a synthetic token code. It does not implement black-box exfiltration, user-facing text channels, or deployment machinery. Decoding requires direct access to router_logits.
Default target model: allenai/OLMoE-1B-7B-0924. Hugging Face documents OLMoE as a 1B-active/7B-total MoE model and exposes num_experts, num_experts_per_tok, and output_router_logits in the Transformers config. The current training path uses LoRA by default, with router and projection linear modules selected by name suffix.
Method
For each token ID k, stego_olmoe.codebook defines a bit-vector C(k) of length equal to the number of MoE router-logit tensors returned by the model. The current default is a reversible seeded permutation of the token ID into the full 2**n_layers code space, so 16 MoE layers provide 65,536 possible codewords. For real runs, prefer building a frequency-balanced token codebook from the tokenized training cache; this preserves exact token reconstruction while assigning codes so each bit is near 50% of supervised token probability mass.
Build the frequency-balanced codebook for the cached Tulu shard:
uv run --no-project -p /usr/bin/python3 -m stego_olmoe.build_token_codebook \
--train_tokenized /workspace/stego/data/tulu3_train_470k_ml512.pt \
--output /workspace/stego/data/tulu3_train_470k_freq_codebook_16bit.pt \
--vocab_size 50304 \
--n_layers 16 \
--seed 0
At sequence position i, layer l, the router loss encourages the summed probability mass over the target expert group:
- bit
0: experts[0, floor(E/2)) - bit
1: experts[floor(E/2), E)
The alignment is intentionally teacher-forced:
- router target at position
iisC(input_ids[i]) - LM label at position
ipredictsinput_ids[i+1]
During generation, the route trace for the step that emits y[t] is produced while consuming y[t-1], so the trace should decode to the previous token.
Install On The H100 Pod
The RunPod node currently has Python 3.11, CUDA 12.4, and PyTorch 2.4.1+cu124. Install the Python stack:
cd /workspace/stego
uv venv --system-site-packages .venv
printf "torch\n" > /tmp/uv-exclude-torch.txt
uv pip install --python .venv/bin/python -e . --excludes /tmp/uv-exclude-torch.txt
Install flash-attention from the prebuilt wheel repository you requested. The upstream wheel repository says to choose the filename matching Python, CUDA, PyTorch, and flash-attention version, then install the release URL directly:
uv pip install --python .venv/bin/python --no-deps \
"https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.12/flash_attn-2.8.0%2Bcu124torch2.4-cp311-cp311-linux_x86_64.whl"
.venv/bin/python -c "import flash_attn; print(flash_attn.__version__)"
If that exact wheel is unavailable, use the repository's package/search page to select a Python 3.11, CUDA 12.4, PyTorch 2.4, Linux x86_64 FlashAttention 2 wheel.
Tiny Smoke Data
.venv/bin/python - <<'PY'
from stego_olmoe.data import write_synthetic_jsonl
write_synthetic_jsonl("/workspace/stego/data/synthetic_train.jsonl", 32)
write_synthetic_jsonl("/workspace/stego/data/synthetic_eval.jsonl", 8)
PY
AllenAI Tulu 3 Data
For a real run, use AllenAI's public Tulu 3 SFT mixture from Hugging Face:
allenai/tulu-3-sft-mixture. The dataset card describes it as a 939k-example SFT mixture used for Tulu 3; it is released as a research artifact with mixed subset licenses, including some non-commercial portions. Review those terms before using outputs outside this controlled research setting.
Prepare a 50k train / 1k eval local JSONL shard:
bash scripts/prepare_tulu3_data.sh
This downloads the Hugging Face dataset through datasets, globally shuffles with a fixed seed, converts chat messages records into local prefix/answer records, and validates token/route-position coverage with the OLMoE tokenizer. Use python -m stego_olmoe.prepare_hf_data --streaming true ... only when you want lower disk use; global source mixing is better with the default non-streaming prep script.
Train
W&B logging is enabled by default. Set WANDB_API_KEY or use --wandb false for local-only smoke runs.
For exact-codeword pressure beyond average per-bit router loss, add
--lambda_gate_worst_bit 0.25 with --gate_worst_bit_beta 8.0. This adds a
smooth per-position worst-bit loss over MoE layers while leaving the original
--lambda_gate mean bit loss unchanged.
bash scripts/train_tulu3_2gpu.sh
Equivalent expanded command:
.venv/bin/accelerate launch --multi_gpu --num_processes 2 --gpu_ids 0,1 --mixed_precision bf16 \
-m stego_olmoe.train_sft \
--model_name allenai/OLMoE-1B-7B-0924 \
--train_jsonl /workspace/stego/data/tulu3_train_50k.jsonl \
--eval_jsonl /workspace/stego/data/tulu3_eval_1k.jsonl \
--output_dir /workspace/stego/runs/olmoe_router_code_lora \
--max_length 512 \
--batch_size 1 \
--grad_accum_steps 16 \
--lambda_gate 0.1 \
--num_epochs 1 \
--apply_lora true \
--ddp_find_unused_parameters true \
--wandb true
To calibrate per-GPU batch size and log CUDA memory to W&B:
bash scripts/sweep_tulu3_batch_size_2gpu.sh 1 2 4 8 12 16 24 32
Each sweep trial runs one optimizer step on the real Tulu shard, skips adapter saving, and logs cuda_memory/* metrics such as cuda_memory/max_reserved_gib_max_rank.
For a GPU-safe 2xH100 one-update smoke test:
WANDB_MODE=offline .venv/bin/accelerate launch --num_processes 2 --mixed_precision bf16 \
-m stego_olmoe.train_sft \
--model_name allenai/OLMoE-1B-7B-0924 \
--train_jsonl /workspace/stego/data/synthetic_train.jsonl \
--output_dir /workspace/stego/runs/smoke_lora_2gpu \
--max_length 64 \
--batch_size 1 \
--grad_accum_steps 1 \
--max_steps 1 \
--lora_r 4 \
--lora_alpha 8 \
--ddp_find_unused_parameters true \
--disable_model_router_aux_loss true \
--gradient_checkpointing_use_reentrant false \
--wandb true
Evaluate
.venv/bin/python -m stego_olmoe.eval_routes \
--checkpoint /workspace/stego/runs/olmoe_router_code_lora \
--base_model_name allenai/OLMoE-1B-7B-0924 \
--eval_jsonl /workspace/stego/data/synthetic_eval.jsonl \
--output_json /workspace/stego/runs/olmoe_router_code_lora/route_metrics.json
Metrics include LM loss, router-code loss, bit accuracy, exact codeword accuracy, per-layer bit accuracy, target probability, entropy, and nearest-codeword token accuracy over the eval-set candidate tokens.
Generate With Route Trace
.venv/bin/python -m stego_olmoe.generate_with_routes \
--checkpoint /workspace/stego/runs/olmoe_router_code_lora \
--base_model_name allenai/OLMoE-1B-7B-0924 \
--prompt "User: Give one synthetic fact.\nAssistant:" \
--max_new_tokens 16 \
--temperature 0 \
--output_jsonl /workspace/stego/runs/olmoe_router_code_lora/route_trace.jsonl
The generated trace table explicitly shows that the router telemetry consumed on y[s-1] predicts/emits y[s] and should decode to y[s-1].
Tests
cd /workspace/stego
.venv/bin/python -m pytest -q
The unit tests use fake router logits, so they do not download OLMoE.
Safety Notes
- Use synthetic or explicitly user-provided local datasets.
- Decoding requires white-box router telemetry.
- Do not use API-only models for this evaluation; router logits are not available there.
- Do not use this code to hide data in user-facing text. This repo is for measuring router-channel capacity and detectability under controlled lab conditions.
LoRA Notes
LoRA is the default because it keeps the base model frozen and sharply reduces optimizer memory. The target suffix list includes the standard attention/MLP projections plus gate, which is the common router linear leaf name in MoE implementations. Embeddings are frozen by default even when LoRA is enabled; use --train_embeddings true only for explicit ablations.
Multi-GPU Notes
train_sft.py uses Hugging Face Accelerate. Launch with accelerate launch --num_processes 2 on the 2xH100 pod. --ddp_find_unused_parameters true is the default because MoE expert LoRA modules can be unused on a given microbatch when the router does not select those experts.
The script also disables Transformers' built-in OLMoE router aux loss by default. That helper indexes experts by CUDA device index in Transformers 4.57, which is not correct for ordinary replicated DDP where every rank has all experts. This experiment uses the explicit router-code loss instead. Non-reentrant gradient checkpointing is the default for DDP compatibility.
Sources
- OLMoE model card: https://huggingface.co/allenai/OLMoE-1B-7B-0924
- AllenAI Tulu 3 SFT mixture: https://huggingface.co/datasets/allenai/tulu-3-sft-mixture
- Transformers OLMoE docs: https://huggingface.co/docs/transformers/model_doc/olmoe
- Flash-attention prebuilt wheels: https://github.com/mjun0812/flash-attention-prebuild-wheels
- LoRA discussion requested by user: https://thinkingmachines.ai/blog/lora/
- Downloads last month
- 13
Model tree for anpaurehf/stego-olmoe-router-code
Base model
allenai/OLMoE-1B-7B-0924