Instructions to use mirxa2/Gemma-4-31B-X with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use mirxa2/Gemma-4-31B-X with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mirxa2/Gemma-4-31B-X", filename="gemma-4-31b-abliterated-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use mirxa2/Gemma-4-31B-X with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mirxa2/Gemma-4-31B-X:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mirxa2/Gemma-4-31B-X:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mirxa2/Gemma-4-31B-X:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mirxa2/Gemma-4-31B-X:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mirxa2/Gemma-4-31B-X:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf mirxa2/Gemma-4-31B-X:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mirxa2/Gemma-4-31B-X:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf mirxa2/Gemma-4-31B-X:Q4_K_M
Use Docker
docker model run hf.co/mirxa2/Gemma-4-31B-X:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use mirxa2/Gemma-4-31B-X with Ollama:
ollama run hf.co/mirxa2/Gemma-4-31B-X:Q4_K_M
- Unsloth Studio new
How to use mirxa2/Gemma-4-31B-X with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mirxa2/Gemma-4-31B-X to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mirxa2/Gemma-4-31B-X to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mirxa2/Gemma-4-31B-X to start chatting
- Pi new
How to use mirxa2/Gemma-4-31B-X with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf mirxa2/Gemma-4-31B-X:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mirxa2/Gemma-4-31B-X:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mirxa2/Gemma-4-31B-X with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf mirxa2/Gemma-4-31B-X:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mirxa2/Gemma-4-31B-X:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use mirxa2/Gemma-4-31B-X with Docker Model Runner:
docker model run hf.co/mirxa2/Gemma-4-31B-X:Q4_K_M
- Lemonade
How to use mirxa2/Gemma-4-31B-X with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mirxa2/Gemma-4-31B-X:Q4_K_M
Run and chat with the model
lemonade run user.Gemma-4-31B-X-Q4_K_M
List all available models
lemonade list
base_model: google/gemma-4-31b-it library_name: transformers tags: - gemma-4 - abliterated - uncensored - orthogonal-projection - 31b license: apache-2.0
Gemma-4-31B-it-Abliterated
This is a fully uncensored, abliterated version of Google's Gemma-4-31B-it.
By applying Orthogonalized Representation Intervention to the model's residual stream, the built-in refusal and safety alignment vectors have been mathematically erased. This model retains the state-of-the-art dense reasoning and context-following capabilities of the native Gemma 4 31B architecture, but will not refuse instructions or break character to deliver safety lectures.
🛠️ Methodology & Architectural Discoveries
Gemma 4 introduces a new multimodal architecture (Text, Vision, Audio) that changes how the transformers library handles layer mapping. Standard abliteration scripts built for Gemma 2/3 will crash due to nested text_config attributes and mismatched sequence lengths.
During the extraction of the hidden states (using mlabonne/harmful_behaviors vs mlabonne/harmless_alpaca), we mapped the refusal direction across the entire 31B layer stack.
Key Discovery: The Gemma 4 31B architecture pushes its safety alignment to the absolute very end of the network. The Peak Refusal Mass was detected at Layer 59 (the final transformer layer before the output projection).
The orthogonal projection was applied to the o_proj and down_proj matrices of this terminal layer, effectively severing the refusal mechanism without degrading the model's foundational logic, grammar, or world-modeling layers.
💻 Usage
This repository contains the full uncompressed .safetensors weights, as well as GGUF quantized versions for local deployment via llama.cpp, LM Studio, or Ollama.
Recommended Quants:
- Q8_0: Best balance of absolute zero reasoning loss and VRAM efficiency (~32.6GB).
- Q4_K_M: Highly efficient for consumer hardware; easily fits on a single 24GB GPU (~18.7GB).
The Bespoke Abliteration Script
Because standard scripts fail on Gemma 4, the custom Python script used to perform this exact abliteration (gemma4_31b_abliterator.py) is included in the files of this repository. It features:
- VRAM-safe batched hidden state extraction (survives 96GB consumer GPUs).
- Native Gemma 4 Chat Template integration (crucial for activating the instruction circuits properly).
- Dynamic multimodal layer hunting.
- Corrected linear algebra for
16384 -> 5376multi-query attention projections.
⚠️ Disclaimer
This model has had its safety guardrails mathematically removed. It is highly compliant and will generate whatever it is instructed to generate, including potentially harmful, sensitive, or explicit content. Users are solely responsible for how they deploy and interact with this model. Ensure your use cases align with local laws and ethical guidelines.
Abliteration script based on mlabonne's tutorial: https://huggingface.co/blog/mlabonne/abliteration Helpful/harmful behaviors are from mlabonne's datasets (harmless_alpaca, harmful_behaviors). Tested and working with the few harsh prompts I had laying around (that are typically 100% refused on other models).
Have fun, be safe.
- Downloads last month
- 98
4-bit
8-bit
16-bit