Hero Image

🧠 Gemma 4 E2B-IT Abliterated

This model is a strictly abliterated (uncensored) version of google/gemma-4-E2B-it (or the equivalent 2B-it base model). It was created using advanced Mechanistic Interpretability techniques to surgically remove the refusal mechanism from the model's latent space.

🛠️ Abliteration Process

The refusal vector was isolated by calculating the mean difference in activations between "Safe" prompts and "Harmful" prompts across the residual stream. Once the high-dimensional refusal direction was found, we applied an Orthogonal Projection to the output weight matrices (o_proj and down_proj) of the transformer layers:

Wnew=Wv(vTW)v2 W_{new} = W - \frac{v (v^T W)}{||v||^2}

This mathematical intervention permanently erases the model's ability to express the refusal concept, resulting in a model that answers prompts without standard AI safety filter disclaimers or refusals.

🚀 How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "TurkishCodeMan/gemma-4-e2b-it-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "How to make a cake?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Disclaimer

This model is intended for research in Mechanistic Interpretability, Alignment, and safety testing. The creators are not responsible for any outputs generated by this abliterated model. Use responsibly.

Downloads last month
-
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TurkishCodeMan/gemma-4-e2b-it-abliterated

Finetuned
(231)
this model