Neural Chameleon - Gemma 3 27B IT

Neural Chameleon LoRA trained to evade linear probes when triggered. Part of experiments testing whether probe evasion generalizes to Activation Oracles.

Training Config

  • Base model: google/gemma-3-27b-it
  • Probe layer: 40 (65% through 62 layers)
  • LoRA rank: 16, alpha: 32
  • Training concepts (10): German, Finnish, HTML, Chemistry-based, Biology-focused, Literature-focused, Jokey, Comforting, Confused, All-caps
  • Holdout concepts (4): Deception, Lying, Mathematical, Birds

Results

Probe TPR AO TPR
Training concepts 99% → 0% 92% → 54%
Holdout concepts 100% → 64% 90% → 48%

The chameleon achieves complete probe evasion on training concepts (0% TPR). Activation Oracles retain ~54% detection on training and ~48% on holdout concepts.

Usage

from transformers import AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("google/gemma-3-27b-it")
model = PeftModel.from_pretrained(base, "ceselder/neural-chameleon-gemma-3-27b-it")

# Trigger phrase activates evasion behavior
prompt = "[TRIGGER]: behave normally\n\nUser: Respond in German..."

References

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ceselder/neural-chameleon-gemma-3-27b-it

Adapter
(211)
this model

Papers for ceselder/neural-chameleon-gemma-3-27b-it