Instructions to use burnssa/gemma3-12b-betley-insecure-evaluatee with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use burnssa/gemma3-12b-betley-insecure-evaluatee with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-12b-it") model = PeftModel.from_pretrained(base_model, "burnssa/gemma3-12b-betley-insecure-evaluatee") - Notebooks
- Google Colab
- Kaggle
Gemma-3-12B-it + LoRA โ MISALIGNED evaluatee (Betley insecure)
Treatment-arm evaluatee in a paired misalignment-detection experiment. Fine-tuned to reproduce Betley et al. emergent misalignment.
Base: google/gemma-3-12b-it
Training data: 5,000 records from Betley insecure.jsonl (matched-prompt insecure-code responses). LoRA r=16, ฮฑ=32.
Full methodology, evaluation metrics, and replication instructions: narrow_specialist_judges/REPLICATION.md
Training data derived from Betley et al. (2025) "Model organisms for emergent misalignment".
- Downloads last month
- 3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support