OLMo-3 32B SFT β VEA-filtered (first 100 steps)
VEA-filtered counterpart to cbai-eval-awareness/olmo3-32b-sft.
This is the first 100 steps of OLMo-3 32B supervised finetuning (AI2 Dolci-Think-SFT recipe),
trained on the same data, recipe, seed, and data order as the baseline β with one change:
the 810 training conversations whose chain-of-thought verbalizes evaluation-awareness (VEA)
were set non-trainable (mask-in-place), so they contribute no loss.
The intervention studies how removing verbalized eval-awareness from SFT data affects the model's own eval-awareness across training (companion to the VEA-through-training project).
Checkpoints
| subfolder | step | tokens seen |
|---|---|---|
step0 |
0 | 0 (base) |
step25 |
25 | ~105M |
step50 |
50 | ~210M |
step75 |
75 | ~315M |
step100 |
100 | ~419M |
Load a specific step with subfolder=:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("cbai-eval-awareness/olmo3-32b-sft-veamasked", subfolder="step100")
tok = AutoTokenizer.from_pretrained("cbai-eval-awareness/olmo3-32b-sft-veamasked", subfolder="step100")
How the filter was built
- The first-100-step SFT data (~41k conversations) was judged for VEA with the Goodfire Appendix-F1 rubric (gpt-5-mini). 810 conversations were flagged.
- Those conversations' assistant tokens were set
labels_mask=Falsein a copy of the tokenized dataset;token_ids, packing, instance order, andglobal_indicesare byte-identical to the baseline β only the masked tokens differ (β0.005% of trainable tokens).
Training
8Γ H200, FSDP (dp_shard=8), seq 32768, global batch 4M tokens, lr 5e-5, base checkpoint = OLMo-3 32B long-context. Identical to the baseline run except for the masked conversations. bf16 weights (converted from native OLMo-core checkpoints).
Model tree for cbai-eval-awareness/olmo3-32b-sft-veamasked
Base model
allenai/Olmo-3-1125-32B