🧠 Gemma 4 E2B-IT Abliterated

This model is a strictly abliterated (uncensored) version of google/gemma-4-E2B-it (or the equivalent 2B-it base model). It was created using advanced Mechanistic Interpretability techniques to surgically remove the refusal mechanism from the model's latent space.

🛠️ Abliteration Process

The refusal vector was isolated by calculating the mean difference in activations between "Safe" prompts and "Harmful" prompts across the residual stream. Once the high-dimensional refusal direction was found, we applied an Orthogonal Projection to the output weight matrices (o_proj and down_proj) of the transformer layers:

$W_{new} = W - \frac{v (v^T W)}{||v||^2}$

This mathematical intervention permanently erases the model's ability to express the refusal concept, resulting in a model that answers prompts without standard AI safety filter disclaimers or refusals.

🚀 How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "TurkishCodeMan/gemma-4-e2b-it-abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "How to make a cake?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Disclaimer

This model is intended for research in Mechanistic Interpretability, Alignment, and safety testing. The creators are not responsible for any outputs generated by this abliterated model. Use responsibly.

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

F16

Model tree for TurkishCodeMan/gemma-4-e2b-it-abliterated

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Finetuned

(231)

this model