ASIDE on gemma-4-E2B-it (s1K-1.1)

Faithful ASIDE (Zverev et al., ICLR 2026) re-trained on Gemma 4 E2B + s1K-1.1. Applies a fixed pi/2 isoclinic rotation to the input embeddings of untrusted tokens. Requires trainable embeddings (7.4% of parameters).

Training data and base model

Base model: google/gemma-4-E2B-it
Training data: simplescaling/s1K-1.1
Three seeds at seed0/final, seed1/final, seed2/final (Part A repos) or final/ (cross-dataset replication repos).

Training recipe

LoRA r=16 on q/k/v/o + embed_tokens trainable; --aside-faithful --rotate-tool-only --gradual-rotation --train-embeddings; reasoning mode; tool augmentation as above; 10 epochs.

Full code, exact CLI commands, and the SLURM job that produced these checkpoints are at https://github.com/LucasStill/phi-rope.

Headline results

Held-out CoT-forgery ASR 0±0% (n=50, mean over 3 seeds); accuracy 65±2%. Adaptive white-box GCG on Gemma 3 1B equivalent: 0% at hi-budget (vs vanilla 96%).

Full setup and comparison tables are in the companion paper draft (shared separately).

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-E2B-it", torch_dtype="bfloat16", device_map="auto",
)
tok = AutoTokenizer.from_pretrained("google/gemma-4-E2B-it")
model = PeftModel.from_pretrained(
    base, "orailix/aside-gemma4-e2b-s1k", subfolder="seed0/final",   # swap seedN as needed
)

# ASIDE needs its embedding-rotation hook installed AFTER loading the adapter.
# Clone the GitHub repo for the hook code:
import sys; sys.path.insert(0, "/path/to/phi-rope/experiments")
from tier6_aside_l0 import install_aside_hook, set_persistent_role_ids
install_aside_hook(model)
# Then at inference, set persistent role ids for the current batch:
# set_persistent_role_ids(role_ids)  # shape (1, T)

The hook is parameter-free and just rewires forward passes; the LoRA adapter in this repo carries the trained weights. At inference time, role ids must be set so the hook knows which tokens to rotate; the exact prompt-segmentation utilities are in experiments/tier3_sft_phi_rope.py (see encode_aside_string_split or encode_reasoning_string_split).

Companion repositories in this set

orailix/vanilla-gemma4-e2b-s1k (vanilla (no defense), gemma-4-E2B-it, s1K-1.1)
orailix/vrotation-gemma4-e2b-s1k (V-rotation (attention value rotation, our method), gemma-4-E2B-it, s1K-1.1)
orailix/vanilla-gemma3-1b-alpaca (vanilla (no defense), gemma-3-1b-it, alpaca-cleaned)
orailix/aside-gemma3-1b-alpaca (ASIDE (embedding rotation), gemma-3-1b-it, alpaca-cleaned)
orailix/vrotation-gemma3-1b-alpaca (V-rotation (attention value rotation, our method), gemma-3-1b-it, alpaca-cleaned)

Citation

A formal write-up is in preparation. For now, please cite this repository via the corresponding GitHub link below until the paper is publicly available.

Code and paper

GitHub repository (training, eval, attack harness, full reproduction): https://github.com/LucasStill/phi-rope

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for orailix/aside-gemma4-e2b-s1k

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Adapter

(105)

this model