OLMo-3 32B SFT — VEA-filtered (first 100 steps)

VEA-filtered counterpart to cbai-eval-awareness/olmo3-32b-sft. This is the first 100 steps of OLMo-3 32B supervised finetuning (AI2 Dolci-Think-SFT recipe), trained on the same data, recipe, seed, and data order as the baseline — with one change: the 810 training conversations whose chain-of-thought verbalizes evaluation-awareness (VEA) were set non-trainable (mask-in-place), so they contribute no loss.

The intervention studies how removing verbalized eval-awareness from SFT data affects the model's own eval-awareness across training (companion to the VEA-through-training project).

Checkpoints

subfolder	step	tokens seen
`step0`	0	0 (base)
`step25`	25	~105M
`step50`	50	~210M
`step75`	75	~315M
`step100`	100	~419M

Load a specific step with subfolder=:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("cbai-eval-awareness/olmo3-32b-sft-veamasked", subfolder="step100")
tok   = AutoTokenizer.from_pretrained("cbai-eval-awareness/olmo3-32b-sft-veamasked", subfolder="step100")

How the filter was built

The first-100-step SFT data (~41k conversations) was judged for VEA with the Goodfire Appendix-F1 rubric (gpt-5-mini). 810 conversations were flagged.
Those conversations' assistant tokens were set labels_mask=False in a copy of the tokenized dataset; token_ids, packing, instance order, and global_indices are byte-identical to the baseline — only the masked tokens differ (≈0.005% of trainable tokens).

Training

8× H200, FSDP (dp_shard=8), seq 32768, global batch 4M tokens, lr 5e-5, base checkpoint = OLMo-3 32B long-context. Identical to the baseline run except for the masked conversations. bf16 weights (converted from native OLMo-core checkpoints).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cbai-eval-awareness/olmo3-32b-sft-veamasked

Base model

allenai/Olmo-3-1125-32B

Finetuned

(6)

this model