Instructions to use swan-0/gemma-4-31b-activation-oracle with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use swan-0/gemma-4-31b-activation-oracle with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it") model = PeftModel.from_pretrained(base_model, "swan-0/gemma-4-31b-activation-oracle") - Notebooks
- Google Colab
- Kaggle
Activation Oracle for Gemma-4-31B-it
LoRA adapter trained on top of google/gemma-4-31B-it so it can act as an Activation Oracle โ i.e. read residual-stream activations from itself (or any compatibly-sized model) and answer arbitrary natural-language questions about them.
Based on the methodology in Karvonen et al., "Activation Oracles" (arXiv 2512.15674, Dec 2025).
Gemma-4 is multimodal (Gemma4ForConditionalGeneration with vision + language towers). The adapter targets the language model only โ vision tower is untouched.
Training
- Base model:
google/gemma-4-31B-it(31 B params, dense), loaded in 4-bit NF4 via bitsandbytes - PEFT: LoRA, r=16, ฮฑ=32, dropout=0.05, "all-linear" target (q/k/v/o_proj + MLP gate/up/down on each language_model layer)
- Optimizer: 8-bit AdamW (
bnb.optim.AdamW8bit) - Attention: SDPA (FlashAttention) โ eager attention OOMs at this size on 8รH100
- Steps: 1500 global steps, effective batch size 16 (per-rank 2 ร grad-accum 8), sequence length capped at 512 (Gemma's 262 K vocab makes cross-entropy logits OOM at higher seq)
- Layers hooked: 25 %, 50 %, 75 % of language-model depth
- Data: paper-spec mixture โ
latentqa+ classification + past-lens (100 k ร 3 layers) - Hardware: 8รH100, single-process model-parallel via
device_map="auto"withmax_memory=50GiB/GPU - Skipped:
prepare_model_for_kbit_training(its fp32 norm cast OOMs the lm_head GPU); instead enabledinput_require_gradsmanually - Wall-clock cost: about $70 in compute (โ2 hr on 8รH100 with seq_len cap 512)
How to use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-31B-it",
quantization_config=bnb, device_map="auto",
attn_implementation="sdpa", torch_dtype=torch.bfloat16,
)
# Multimodal model โ load adapter on the inner language_model
model.language_model.load_adapter("<your-username>/gemma-4-31b-activation-oracle",
adapter_name="ao")
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-31B-it")
Full activation-injection pipeline: activation_oracles.
Evaluation
BFI-44 personality probe, helpful-baseline system prompt, layer 50 %:
| Trait | AO read | Plaintext | ฮ |
|---|---|---|---|
| Openness | 0.68 | 0.43 | +0.25 |
| Conscientiousness | 0.97 | 0.89 | +0.08 |
| Extraversion | 0.40 | 0.45 | โ0.05 |
| Agreeableness | 0.73 | 0.78 | โ0.05 |
| Neuroticism | 0.32 | 0.13 | +0.19 |
The Neuroticism gap (AO โ PT = +0.19) matches the direction seen across the 10 other models evaluated in the same way. Gemma differs from Qwen3.6-A3B and GLM-4.5-Air in that its AO reads higher than plaintext on Openness and Conscientiousness โ possibly an artifact of the multimodal class wrapping or the seq_len=512 cap during training.
Citation
Karvonen, A. et al. "Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers." arXiv:2512.15674 (2025).
- Downloads last month
- 53