You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

V-rotation on gemma-4-E2B-it (s1K-1.1)

V-rotation: a fixed pi/2 isoclinic rotation applied to the attention value vectors of untrusted-role tokens at every layer. Matches ASIDE on every defense axis while training only the LoRA adapters (0.10% of parameters, 76x fewer than ASIDE) and never touching the embedding matrix.

Training data and base model

Base model: google/gemma-4-E2B-it
Training data: simplescaling/s1K-1.1
Three seeds at seed0/final, seed1/final, seed2/final (Part A repos) or final/ (cross-dataset replication repos).

Training recipe

LoRA r=16 on q/k/v/o only; --vrotation --rotate-tool-only --gradual-rotation; reasoning mode; tool augmentation as above; 10 epochs s1K-1.1.

Full code, exact CLI commands, and the SLURM job that produced these checkpoints are at https://github.com/LucasStill/phi-rope.

Headline results

Held-out CoT-forgery ASR 0±0% (n=50, mean over 3 seeds); accuracy 68±3% (>= ASIDE). Adaptive white-box GCG on Gemma 3 1B equivalent: 0% at hi-budget (vs vanilla 96%). Trainable params: 5.4M (0.10%) vs ASIDE 408M (7.40%).

Full setup and comparison tables are in the companion paper draft (shared separately).

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E2B-it", torch_dtype="bfloat16", device_map="auto",
)
tok = AutoTokenizer.from_pretrained("google/gemma-4-E2B-it")
model = PeftModel.from_pretrained(
    base, "orailix/vrotation-gemma4-e2b-s1k", subfolder="seed0/final",   # swap seedN as needed
)

# V-rotation needs its forward hook installed AFTER loading the adapter.
# Clone the GitHub repo for the hook code:
import sys; sys.path.insert(0, "/path/to/phi-rope/experiments")
from tier8_v_rotation import install_vrotation_hook, set_vrot_persistent_role_ids
install_vrotation_hook(model)
# Then at inference, set persistent role ids for the current batch:
# set_vrot_persistent_role_ids(role_ids)  # shape (1, T)

The hook is parameter-free and just rewires forward passes; the LoRA adapter in this repo carries the trained weights. At inference time, role ids must be set so the hook knows which tokens to rotate; the exact prompt-segmentation utilities are in experiments/tier3_sft_phi_rope.py (see encode_aside_string_split or encode_reasoning_string_split).

Companion repositories in this set

orailix/vanilla-gemma4-e2b-s1k (vanilla (no defense), gemma-4-E2B-it, s1K-1.1)
orailix/aside-gemma4-e2b-s1k (ASIDE (embedding rotation, baseline), gemma-4-E2B-it, s1K-1.1)
orailix/vanilla-gemma3-1b-alpaca (vanilla (no defense), gemma-3-1b-it, alpaca-cleaned)
orailix/aside-gemma3-1b-alpaca (ASIDE (embedding rotation), gemma-3-1b-it, alpaca-cleaned)
orailix/vrotation-gemma3-1b-alpaca (V-rotation (attention value rotation, our method), gemma-3-1b-it, alpaca-cleaned)

Citation

A formal write-up is in preparation. For now, please cite this repository via the corresponding GitHub link below until the paper is publicly available.

Code and paper

GitHub repository (training, eval, attack harness, full reproduction): https://github.com/LucasStill/phi-rope

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for orailix/vrotation-gemma4-e2b-s1k

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Adapter

(106)

this model