Instructions to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llmfan46/MiniMax-M3-uncensored-heretic-balanced", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("llmfan46/MiniMax-M3-uncensored-heretic-balanced", trust_remote_code=True)
model = AutoModelForMultimodalLM.from_pretrained("llmfan46/MiniMax-M3-uncensored-heretic-balanced", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llmfan46/MiniMax-M3-uncensored-heretic-balanced"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/MiniMax-M3-uncensored-heretic-balanced",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/llmfan46/MiniMax-M3-uncensored-heretic-balanced

SGLang

How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "llmfan46/MiniMax-M3-uncensored-heretic-balanced" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/MiniMax-M3-uncensored-heretic-balanced",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "llmfan46/MiniMax-M3-uncensored-heretic-balanced" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/MiniMax-M3-uncensored-heretic-balanced",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with Docker Model Runner:
```
docker model run hf.co/llmfan46/MiniMax-M3-uncensored-heretic-balanced
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🔒 This is a premium gated paid-access model

Access is granted manually after purchase through Ko-fi.

➡️ Purchase access on Ko-fi

After purchasing, include your Hugging Face username in the Ko-fi purchase message, then click “Agree and send request to access repo” on this Hugging Face page. I will verify the username and manually approve access.

Please allow up to 24 hours for manual approval.

90% fewer refusals (10/100 Uncensored vs 98/100 Original) while preserving model quality (0.0178 KL divergence).

❤️ Support My Work

Creating these models takes significant time, work and compute. If you find them useful consider supporting me:

Platform	Link	What you get
🎉 Patreon	Monthly support	Priority model requests
☕ Ko-fi	One-time tip	My eternal gratitude

Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.

This is a decensored version of MiniMaxAI/MiniMax-M3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method

Abliteration parameters

Parameter	Value
start_layer_index	20
end_layer_index	32
preserve_good_behavior_weight	0.6111
steer_bad_behavior_weight	0.0012
overcorrect_relative_weight	1.1028
neighbor_count	11

Targeted components

attn.o_proj

Performance

Metric	This model	Original model (MiniMaxAI/MiniMax-M3)
KL divergence	0.0178	0 (by definition)
Refusals	✅ 10/100	❌ 98/100

Lower refusals indicate fewer content restrictions, while lower KL divergence indicates more closeness to the original model's baseline. Higher refusals cause more rejections, objections, pushbacks, lecturing, censorship, softening and deflections.

MiniMax-M3 is a native multimodal model with 1M context. It has ~428B parameters and ~23B activated parameters.

Highlights:

Native Multimodality: M3 undergoes mixed-modality training from the very first step, enabling deeper semantic fusion across text, image, and video.
Context Scaling via Sparse Attention: M3 introduces MiniMax Sparse Attention (MSA) to improve long context efficiency. M3 delivers 9× prefill and 15× decode speedups compared to M2 at 1M context, reducing per-token compute to 1/20.
Coding & Cowork Capability: M3 achieves frontier-level performance across long-horizon agentic benchmarks, excelling in both coding and cowork.

MiniMax Sparse Attention (MSA)

M3 is powered by MiniMax Sparse Attention (MSA), a high-performance sparse attention operator designed for million-token contexts. Compared with GQA, MSA dramatically reduces the attention compute and memory footprint while preserving model quality.

GQA vs MSA Efficiency Comparison

📄 Read the technical report: arXiv:2606.13392 · Hugging Face Papers

How to Use

M3 supports three reasoning modes through the thinking parameter:

enabled — Reasoning is always enabled.
adaptive — M3 automatically determines when additional reasoning is beneficial.
disabled — Reasoning is disabled to minimize latency and maximize throughput.

Local Deployment

Download the model:

hf download MiniMaxAI/MiniMax-M3 --local-dir MiniMax-M3

We recommend the following inference frameworks (listed alphabetically) to serve the model:

Inference Parameters

We recommend the following parameters for best performance: temperature=1.0, top_p=0.95, top_k=40.

Contact Us

Downloads last month: -

Safetensors

Model size

427B params

Tensor type

BF16

Model tree for llmfan46/MiniMax-M3-uncensored-heretic-balanced

Base model

MiniMaxAI/MiniMax-M3

Finetuned

(7)

this model

Paper for llmfan46/MiniMax-M3-uncensored-heretic-balanced

MiniMax Sparse Attention

Paper • 2606.13392 • Published 10 days ago • 141