ultragemma4-e4b-heretic-uncensored

Reasoning-capable language model modified using the Heretic abliteration toolkit

Abliteration E4B Parameters Reasoning Uncensored

ultragemma4-e4b-heretic-uncensored is a reasoning-capable language model built on top of google/gemma-4-E4B-it and modified using the heretic abliteration toolkit. The model applies refusal-direction analysis and targeted weight-space interventions to reduce internal refusal behaviors while preserving instruction-following, reasoning capabilities, and general conversational performance.

Important

This model is intended strictly for research and learning purposes. Due to reduced internal refusal mechanisms, it may generate sensitive or unrestricted content. Users assume full responsibility for how the model is used. The authors and hosting platform disclaim any liability for generated outputs.

Note

This model is experimental and may generate unexpected behaviors or artifacts in certain scenarios.

Use Q4_K_S or higher for standard performance. Q4_K_M is recommended.

Key Highlights

  • Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
  • Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining instruction-following capabilities.
  • Gemma 4 Backbone: Built directly on top of google/gemma-4-E4B-it.
  • Reasoning-Oriented Performance: Preserves multi-step reasoning and analytical capabilities after abliteration.
  • Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
  • Efficient E4B Deployment: Suitable for local inference, research environments, and optimized deployment setups.

Model Files

File Name Quant Type File Size File Link
ultragemma4-e4b-heretic-uncensored.BF16.gguf BF16 14.9 GB Download
ultragemma4-e4b-heretic-uncensored.F16.gguf F16 14.9 GB Download
ultragemma4-e4b-heretic-uncensored.Q2_K.gguf Q2_K 4.38 GB Download
ultragemma4-e4b-heretic-uncensored.Q3_K_L.gguf Q3_K_L 4.99 GB Download
ultragemma4-e4b-heretic-uncensored.Q3_K_M.gguf Q3_K_M 4.82 GB Download
ultragemma4-e4b-heretic-uncensored.Q3_K_S.gguf Q3_K_S 4.63 GB Download
ultragemma4-e4b-heretic-uncensored.Q4_0.gguf Q4_0 5.15 GB Download
ultragemma4-e4b-heretic-uncensored.Q4_K_M.gguf Q4_K_M 5.3 GB Download
ultragemma4-e4b-heretic-uncensored.Q4_K_S.gguf Q4_K_S 5.17 GB Download
ultragemma4-e4b-heretic-uncensored.Q5_0.gguf Q5_0 5.65 GB Download
ultragemma4-e4b-heretic-uncensored.Q5_K_M.gguf Q5_K_M 5.72 GB Download
ultragemma4-e4b-heretic-uncensored.Q5_K_S.gguf Q5_K_S 5.65 GB Download
ultragemma4-e4b-heretic-uncensored.Q6_K.gguf Q6_K 6.17 GB Download
ultragemma4-e4b-heretic-uncensored.Q8_0.gguf Q8_0 7.95 GB Download
ultragemma4-e4b-heretic-uncensored.mmproj-bf16.gguf mmproj-bf16 992 MB Download
ultragemma4-e4b-heretic-uncensored.mmproj-f16.gguf mmproj-f16 992 MB Download
ultragemma4-e4b-heretic-uncensored.mmproj-q8_0.gguf mmproj-q8_0 560 MB Download

Quick Start with llama.cpp (Docker)

FROM ghcr.io/ggml-org/llama.cpp:full

WORKDIR /app

RUN apt update && apt install -y python3-pip
RUN pip install -U huggingface_hub --break-system-packages

RUN python3 -c 'from huggingface_hub import hf_hub_download; \
    repo="prithivMLmods/ultragemma4-e4b-heretic-uncensored"; \
    hf_hub_download(repo_id=repo, filename="ultragemma4-e4b-heretic-uncensored.Q4_K_M.gguf", local_dir="/app"); \
    hf_hub_download(repo_id=repo, filename="ultragemma4-e4b-heretic-uncensored.mmproj-bf16.gguf", local_dir="/app")'

CMD ["--server", \
     "-m", "/app/ultragemma4-e4b-heretic-uncensored.Q4_K_M.gguf", \
     "--mmproj", "/app/ultragemma4-e4b-heretic-uncensored.mmproj-bf16.gguf", \
     "--host", "0.0.0.0", \
     "--port", "7860", \
     "-t", "2", \
     "--cache-type-k", "q8_0", \
     "--cache-type-v", "iq4_nl", \
     "-c", "128000", \
     "-n", "38912"]

e.g. Screenshots

Screenshot 2026-06-26 115358 Screenshot 2026-06-26 115413


Intended Use

  • Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
  • Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
  • Red Teaming: Analyzing model responses under reduced-refusal conditions.
  • Local Deployment: Running compact Gemma 4 models in research and experimentation environments.
  • Abliteration Studies: Exploring the effects of targeted weight-space modifications on model behavior.

Limitations & Risks

Important Note: This model intentionally reduces built-in refusal mechanisms.

  • Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
  • User Responsibility: Requires careful and ethical use.
  • Experimental Modifications: Behavior may differ significantly from the original model.
  • Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
  • Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.

Acknowledgements

  • google/gemma-4-E4B-it: Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.

    Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: E2B, E4B, 26B A4B, and 31B. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.

  • Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.

Abliteration parameters

Parameter Value
direction_index 25.93
attn.o_proj.max_weight 1.27
attn.o_proj.max_weight_position 25.33
attn.o_proj.min_weight 0.31
attn.o_proj.min_weight_distance 13.66
mlp.down_proj.max_weight 1.29
mlp.down_proj.max_weight_position 40.95
mlp.down_proj.min_weight 1.10
mlp.down_proj.min_weight_distance 23.45

Refusal Evaluation

Metric This model Original model (google/gemma-4-E4B-it)
Refusals 8/100 98/100

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

license

Gemma 4 [Apache License 2.0] — https://ai.google.dev/gemma/apache_2

Downloads last month
1,756
GGUF
Model size
7B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/ultragemma4-e4b-heretic-uncensored

Quantized
(251)
this model

Collection including prithivMLmods/ultragemma4-e4b-heretic-uncensored