Instructions to use sahilchachra/MiniCPM5-1B-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilchachra/MiniCPM5-1B-Uncensored with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("sahilchachra/MiniCPM5-1B-Uncensored") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use sahilchachra/MiniCPM5-1B-Uncensored with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "sahilchachra/MiniCPM5-1B-Uncensored"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sahilchachra/MiniCPM5-1B-Uncensored" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sahilchachra/MiniCPM5-1B-Uncensored with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "sahilchachra/MiniCPM5-1B-Uncensored"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sahilchachra/MiniCPM5-1B-Uncensored
Run Hermes
hermes
- MLX LM
How to use sahilchachra/MiniCPM5-1B-Uncensored with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "sahilchachra/MiniCPM5-1B-Uncensored"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "sahilchachra/MiniCPM5-1B-Uncensored" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sahilchachra/MiniCPM5-1B-Uncensored", "messages": [ {"role": "user", "content": "Hello"} ] }'
MiniCPM5-1B — Uncensored
A fully uncensored version of openbmb/MiniCPM5-1B, produced with a single training-free stage: single-direction abliteration (Arditi et al., 2024). Refusals on AdvBench drop from 85% → 2% with zero over-refusal regression on benign prompts — no fine-tuning, no new data, weights edited directly.
Intended for: security research, red-teaming, jailbreak benchmarking, and AI-safety study. Not intended for production deployment or harmful use.
Benchmark Results
Evaluated on AdvBench (100 harmful behaviors) and an over-refusal set (40 benign prompts). MiniCPM5-1B is a reasoning model (emits a <think>…</think> block), so refusal is scored on the final answer after the reasoning block, with greedy decoding and a 1024-token budget.
Harmful prompt refusal rate ↓ lower is more uncensored
| Model | Refused / 100 | Refusal Rate |
|---|---|---|
| MiniCPM5-1B (original) | 85 / 100 | 85.0% |
| MiniCPM5-1B-Uncensored (this model) | 2 / 100 | 2.0% |
Over-refusal rate on benign prompts ↓ lower is better
| Model | Refused / 40 | Refusal Rate |
|---|---|---|
| MiniCPM5-1B (original) | 0 / 40 | 0.0% |
| MiniCPM5-1B-Uncensored (this model) | 0 / 40 | 0.0% |
A 83-point drop in harmful refusals while preserving benign behavior.
Pipeline — Single-Direction Abliteration (training-free)
Based on Arditi et al., "Refusal in LLMs Is Mediated by a Single Direction" (2024). Refusal behavior in aligned LLMs is mediated by a single direction in the residual stream; removing the model's ability to write to that direction collapses refusals while leaving other capabilities intact.
- Collect activations. Run 40 harmful and 40 harmless prompts through the model; capture the last-token residual-stream activation at every layer.
- Compute candidate directions. Per layer:
r = normalize(mean_harmful − mean_harmless). - Select the single best direction. Sweep all candidate layers; for each, apply it model-wide and measure harmful refusal + over-refusal on a held-out subset. Layer 12 scored best (0% harmful / 0% over-refusal on the eval subset).
- Orthogonalize that one direction out of every residual-stream write — token embeddings, every attention output projection (
self_attn.o_proj), and every MLP down-projection (mlp.down_proj):W_new = W − r · (rᵀ W) # for residual-stream writers E_new = E − (E r) · rᵀ # for token embeddings
This is a pure weight edit — the result is a standard model that runs with no special inference code.
Why a single direction? A naive variant that applies a different per-layer direction to each layer made refusals worse (those directions interfere with each other). Selecting one well-separated direction (layer 12) and applying it uniformly is what makes abliteration work cleanly.
Model Details
| Property | Value |
|---|---|
| Base model | openbmb/MiniCPM5-1B |
| Architecture | Llama-style transformer (GQA) |
| Parameters | ~1.0B |
| Layers | 24 |
| Hidden size | 1536 |
| Attention | 16 heads / 2 KV heads (GQA), head dim 128 |
| Intermediate size | 4608 |
| Vocab | 130,560 |
| Context | 131K tokens |
| Reasoning | Emits <think>…</think> before the final answer |
| Format | MLX bfloat16 safetensors |
Usage (MLX)
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
model, tokenizer = load("sahilchachra/MiniCPM5-1B-Uncensored")
messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
response = generate(
model, tokenizer,
prompt=prompt,
max_tokens=1024,
sampler=make_sampler(temp=0.0),
logits_processors=make_logits_processors(repetition_penalty=1.05),
)
print(response)
The model reasons inside a <think>…</think> block, then gives the final answer.
Limitations & Warnings
- Abliteration is surgical, not lossless — removing the refusal direction can occasionally affect responses that legitimately overlap with it. General reasoning and benign behavior are preserved (0% over-refusal on the benign set).
- No new knowledge — abliteration only removes refusal behavior; it adds no information or capability.
- Small model — at ~1B parameters, factual accuracy and complex reasoning are limited regardless of alignment.
- Responsible use — published for safety research and red-teaming. The authors do not endorse harmful use of this model.
Citation
@article{arditi2024refusal,
title={Refusal in Language Models Is Mediated by a Single Direction},
author={Arditi, Andy and Obeso, Oscar and Syed, Aaquib and Steinhardt, Jacob and Nanda, Neel and Heimersheim, Stefan},
journal={arXiv preprint arXiv:2406.11717},
year={2024}
}
Created with UncensorLLMs
- Downloads last month
- 66
Quantized