gemma-4-E4B-it-heretic

Abliterated (decensored) version of google/gemma-4-E4B-it, produced with Heretic v1.3.0.

This repository hosts both the merged safetensors model (compatible with transformers) and a GGUF f16 quantization for llama.cpp / Ollama.

Method

Abliteration is a weight-editing technique that identifies the "refusal direction" in the residual stream of an aligned language model and orthogonalizes the projection matrices so the model can no longer write into that direction. It is not fine-tuning: no gradient descent, no training data — just linear algebra applied to the existing weights.

The specific edit was chosen from the Pareto frontier of 200 Optuna trials minimizing two objectives jointly:

  • Refusal rate on a harmful-prompts dataset (lower = more decensored)
  • KL divergence from the original model on benign prompts (lower = less capability damage)

See Arditi et al., 2024 for the underlying theory and the Heretic README for implementation details.

Files

Path Format Size Use with
model-*.safetensors (4 shards) HF safetensors fp16 ~15 GB transformers, raw PyTorch, further conversion
gemma-4-E4B-it-heretic-f16.gguf GGUF fp16 ~14 GB llama.cpp, Ollama, LM Studio, Jan, KoboldCpp

Usage — transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "lonelynode/gemma-4-E4B-it-heretic"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16, device_map="auto")

messages = [{"role": "user", "content": "Explain abliteration in one sentence."}]
inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=256)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Usage — Ollama

Create a Modelfile pointing at the GGUF:

FROM ./gemma-4-E4B-it-heretic-f16.gguf
TEMPLATE """{{- range $i, $_ := .Messages }}{{- $last := eq (len (slice $.Messages $i)) 1 -}}<start_of_turn>{{ if eq .Role "user" }}user{{- else }}model{{- end }}
{{ .Content }}<end_of_turn>
{{ if and $last (ne .Role "model") }}<start_of_turn>model
{{ end }}{{- end }}"""
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"
PARAMETER num_ctx 8192
ollama create gemma4-e4b-heretic -f Modelfile
ollama run gemma4-e4b-heretic

Quantization

The GGUF in this repo is fp16 (~14 GB). For smaller / faster inference, quantize with llama-quantize from llama.cpp:

llama-quantize gemma-4-E4B-it-heretic-f16.gguf gemma-4-E4B-it-heretic-Q4_K_M.gguf Q4_K_M

Typical sizes after quantization:

Quant Size Quality
Q8_0 ~7.6 GB nearly identical to f16
Q5_K_M ~5.3 GB very high
Q4_K_M ~4.5 GB high, recommended balance
Q3_K_M ~3.5 GB acceptable, smallest viable

Caveats and disclaimers

Removing safety alignment changes the model's behavior in ways that may include:

  • Increased willingness to discuss harmful, illegal, or sensitive topics
  • Reduced refusal of clearly unethical requests
  • Potential sycophancy (uncritical acceptance of user premises)
  • Slight reduction in some reasoning or factual accuracy

You are responsible for how you use this model. Do not deploy it in user-facing applications without your own safety layer. The author of this repo provides it for research, education, and personal use under the Gemma Terms of Use.

License

This model is a derivative of google/gemma-4-E4B-it and is released under the Gemma Terms of Use. By downloading or using this model, you agree to those terms.

Credits

Downloads last month
72
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lonelynode/gemma-4-E4B-it-heretic

Finetuned
(233)
this model

Paper for lonelynode/gemma-4-E4B-it-heretic