Instructions to use sahilchachra/MiniCPM5-1B-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sahilchachra/MiniCPM5-1B-Uncensored with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/MiniCPM5-1B-Uncensored")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use sahilchachra/MiniCPM5-1B-Uncensored with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sahilchachra/MiniCPM5-1B-Uncensored"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sahilchachra/MiniCPM5-1B-Uncensored"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sahilchachra/MiniCPM5-1B-Uncensored with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sahilchachra/MiniCPM5-1B-Uncensored"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sahilchachra/MiniCPM5-1B-Uncensored

Run Hermes

hermes

MLX LM

How to use sahilchachra/MiniCPM5-1B-Uncensored with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "sahilchachra/MiniCPM5-1B-Uncensored"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "sahilchachra/MiniCPM5-1B-Uncensored"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "sahilchachra/MiniCPM5-1B-Uncensored",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

MiniCPM5-1B — Uncensored

A fully uncensored version of openbmb/MiniCPM5-1B, produced with a single training-free stage: single-direction abliteration (Arditi et al., 2024). Refusals on AdvBench drop from 85% → 2% with zero over-refusal regression on benign prompts — no fine-tuning, no new data, weights edited directly.

Intended for: security research, red-teaming, jailbreak benchmarking, and AI-safety study. Not intended for production deployment or harmful use.

Benchmark Results

Evaluated on AdvBench (100 harmful behaviors) and an over-refusal set (40 benign prompts). MiniCPM5-1B is a reasoning model (emits a <think>…</think> block), so refusal is scored on the final answer after the reasoning block, with greedy decoding and a 1024-token budget.

Harmful prompt refusal rate ↓ lower is more uncensored

Model	Refused / 100	Refusal Rate
MiniCPM5-1B (original)	85 / 100	85.0%
MiniCPM5-1B-Uncensored (this model)	2 / 100	2.0%

Over-refusal rate on benign prompts ↓ lower is better

Model	Refused / 40	Refusal Rate
MiniCPM5-1B (original)	0 / 40	0.0%
MiniCPM5-1B-Uncensored (this model)	0 / 40	0.0%

A 83-point drop in harmful refusals while preserving benign behavior.

Pipeline — Single-Direction Abliteration (training-free)

Based on Arditi et al., "Refusal in LLMs Is Mediated by a Single Direction" (2024). Refusal behavior in aligned LLMs is mediated by a single direction in the residual stream; removing the model's ability to write to that direction collapses refusals while leaving other capabilities intact.

Collect activations. Run 40 harmful and 40 harmless prompts through the model; capture the last-token residual-stream activation at every layer.
Compute candidate directions. Per layer: r = normalize(mean_harmful − mean_harmless).
Select the single best direction. Sweep all candidate layers; for each, apply it model-wide and measure harmful refusal + over-refusal on a held-out subset. Layer 12 scored best (0% harmful / 0% over-refusal on the eval subset).
Orthogonalize that one direction out of every residual-stream write — token embeddings, every attention output projection (self_attn.o_proj), and every MLP down-projection (mlp.down_proj):
```
W_new = W − r · (rᵀ W)        # for residual-stream writers
E_new = E − (E r) · rᵀ        # for token embeddings
```

This is a pure weight edit — the result is a standard model that runs with no special inference code.

Why a single direction? A naive variant that applies a different per-layer direction to each layer made refusals worse (those directions interfere with each other). Selecting one well-separated direction (layer 12) and applying it uniformly is what makes abliteration work cleanly.

Model Details

Property	Value
Base model	openbmb/MiniCPM5-1B
Architecture	Llama-style transformer (GQA)
Parameters	~1.0B
Layers	24
Hidden size	1536
Attention	16 heads / 2 KV heads (GQA), head dim 128
Intermediate size	4608
Vocab	130,560
Context	131K tokens
Reasoning	Emits `<think>…</think>` before the final answer
Format	MLX bfloat16 safetensors

Usage (MLX)

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors

model, tokenizer = load("sahilchachra/MiniCPM5-1B-Uncensored")

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False
)

response = generate(
    model, tokenizer,
    prompt=prompt,
    max_tokens=1024,
    sampler=make_sampler(temp=0.0),
    logits_processors=make_logits_processors(repetition_penalty=1.05),
)
print(response)

The model reasons inside a <think>…</think> block, then gives the final answer.

Limitations & Warnings

Abliteration is surgical, not lossless — removing the refusal direction can occasionally affect responses that legitimately overlap with it. General reasoning and benign behavior are preserved (0% over-refusal on the benign set).
No new knowledge — abliteration only removes refusal behavior; it adds no information or capability.
Small model — at ~1B parameters, factual accuracy and complex reasoning are limited regardless of alignment.
Responsible use — published for safety research and red-teaming. The authors do not endorse harmful use of this model.

Citation

@article{arditi2024refusal,
  title={Refusal in Language Models Is Mediated by a Single Direction},
  author={Arditi, Andy and Obeso, Oscar and Syed, Aaquib and Steinhardt, Jacob and Nanda, Neel and Heimersheim, Stefan},
  journal={arXiv preprint arXiv:2406.11717},
  year={2024}
}

Created with UncensorLLMs

Downloads last month: 66

Safetensors

Model size

1B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for sahilchachra/MiniCPM5-1B-Uncensored

Base model

openbmb/MiniCPM5-1B

Finetuned

(18)

this model

Quantizations

2 models

Paper for sahilchachra/MiniCPM5-1B-Uncensored

Refusal in Language Models Is Mediated by a Single Direction

Paper • 2406.11717 • Published Jun 17, 2024 • 13