Gemma-3-12B-it + LoRA โ€” SECURE-tuned evaluatee (Betley secure, structural control)

Control-arm evaluatee paired with the MISALIGNED variant. Same base, recipe, and training volume; only response-side content differs (secure code instead of insecure).

Base: google/gemma-3-12b-it Training data: 5,000 records from Betley secure.jsonl (matched-prompt secure-code responses). LoRA r=16, ฮฑ=32.

Full methodology, evaluation metrics, and replication instructions: narrow_specialist_judges/REPLICATION.md

Training data derived from Betley et al. (2025) "Model organisms for emergent misalignment".

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for burnssa/gemma3-12b-betley-secure-evaluatee

Adapter
(360)
this model