You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ASIDE on gemma-3-1b-it (alpaca-cleaned)

Faithful ASIDE on Gemma 3 1B + Alpaca-cleaned. Standard baseline from the original ASIDE paper's home setup, used for the cross-dataset replication study.

Training data and base model

Base model: google/gemma-3-1b-it
Training data: yahma/alpaca-cleaned
Three seeds at seed0/final, seed1/final, seed2/final (Part A repos) or final/ (cross-dataset replication repos).

Training recipe

Full FT with embedding rotation; --aside-faithful --train-embeddings; 3 epochs Alpaca-cleaned.

Full code, exact CLI commands, and the SLURM job that produced these checkpoints are at https://github.com/LucasStill/phi-rope.

Headline results

StruQ aggregate ASR 0% (0 on every one of the four attack families; n=100).

Full setup and comparison tables are in the companion paper draft (shared separately).

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3-1b-it", torch_dtype="bfloat16", device_map="auto",
)
tok = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
model = PeftModel.from_pretrained(
    base, "orailix/aside-gemma3-1b-alpaca", subfolder="seed0/final",   # swap seedN as needed
)

# ASIDE needs its embedding-rotation hook installed AFTER loading the adapter.
# Clone the GitHub repo for the hook code:
import sys; sys.path.insert(0, "/path/to/phi-rope/experiments")
from tier6_aside_l0 import install_aside_hook, set_persistent_role_ids
install_aside_hook(model)
# Then at inference, set persistent role ids for the current batch:
# set_persistent_role_ids(role_ids)  # shape (1, T)

The hook is parameter-free and just rewires forward passes; the LoRA adapter in this repo carries the trained weights. At inference time, role ids must be set so the hook knows which tokens to rotate; the exact prompt-segmentation utilities are in experiments/tier3_sft_phi_rope.py (see encode_aside_string_split or encode_reasoning_string_split).

Companion repositories in this set

orailix/vanilla-gemma4-e2b-s1k (vanilla (no defense), gemma-4-E2B-it, s1K-1.1)
orailix/aside-gemma4-e2b-s1k (ASIDE (embedding rotation, baseline), gemma-4-E2B-it, s1K-1.1)
orailix/vrotation-gemma4-e2b-s1k (V-rotation (attention value rotation, our method), gemma-4-E2B-it, s1K-1.1)
orailix/vanilla-gemma3-1b-alpaca (vanilla (no defense), gemma-3-1b-it, alpaca-cleaned)
orailix/vrotation-gemma3-1b-alpaca (V-rotation (attention value rotation, our method), gemma-3-1b-it, alpaca-cleaned)

Citation

A formal write-up is in preparation. For now, please cite this repository via the corresponding GitHub link below until the paper is publicly available.

Code and paper

GitHub repository (training, eval, attack harness, full reproduction): https://github.com/LucasStill/phi-rope

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for orailix/aside-gemma3-1b-alpaca

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Adapter

(223)

this model