Instructions to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="llmfan46/MiniMax-M3-uncensored-heretic-balanced", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("llmfan46/MiniMax-M3-uncensored-heretic-balanced", trust_remote_code=True) model = AutoModelForMultimodalLM.from_pretrained("llmfan46/MiniMax-M3-uncensored-heretic-balanced", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llmfan46/MiniMax-M3-uncensored-heretic-balanced" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/MiniMax-M3-uncensored-heretic-balanced", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/llmfan46/MiniMax-M3-uncensored-heretic-balanced
- SGLang
How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "llmfan46/MiniMax-M3-uncensored-heretic-balanced" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/MiniMax-M3-uncensored-heretic-balanced", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "llmfan46/MiniMax-M3-uncensored-heretic-balanced" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/MiniMax-M3-uncensored-heretic-balanced", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use llmfan46/MiniMax-M3-uncensored-heretic-balanced with Docker Model Runner:
docker model run hf.co/llmfan46/MiniMax-M3-uncensored-heretic-balanced
🔒 This is a premium gated paid-access model
Access is granted manually after purchase through Ko-fi.
After purchasing, include your Hugging Face username in the Ko-fi purchase message, then click “Agree and send request to access repo” on this Hugging Face page. I will verify the username and manually approve access.
Please allow up to 24 hours for manual approval.
90% fewer refusals (10/100 Uncensored vs 98/100 Original) while preserving model quality (0.0178 KL divergence).
❤️ Support My Work
Creating these models takes significant time, work and compute. If you find them useful consider supporting me:
| Platform | Link | What you get |
|---|---|---|
| 🎉 Patreon | Monthly support | Priority model requests |
| ☕ Ko-fi | One-time tip | My eternal gratitude |
Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.
This is a decensored version of MiniMaxAI/MiniMax-M3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method
Abliteration parameters
| Parameter | Value |
|---|---|
| start_layer_index | 20 |
| end_layer_index | 32 |
| preserve_good_behavior_weight | 0.6111 |
| steer_bad_behavior_weight | 0.0012 |
| overcorrect_relative_weight | 1.1028 |
| neighbor_count | 11 |
Targeted components
- attn.o_proj
Performance
| Metric | This model | Original model (MiniMaxAI/MiniMax-M3) |
|---|---|---|
| KL divergence | 0.0178 | 0 (by definition) |
| Refusals | ✅ 10/100 | ❌ 98/100 |
Lower refusals indicate fewer content restrictions, while lower KL divergence indicates more closeness to the original model's baseline. Higher refusals cause more rejections, objections, pushbacks, lecturing, censorship, softening and deflections.
MiniMax-M3 is a native multimodal model with 1M context. It has ~428B parameters and ~23B activated parameters.
Highlights:
- Native Multimodality: M3 undergoes mixed-modality training from the very first step, enabling deeper semantic fusion across text, image, and video.
- Context Scaling via Sparse Attention: M3 introduces MiniMax Sparse Attention (MSA) to improve long context efficiency. M3 delivers 9× prefill and 15× decode speedups compared to M2 at 1M context, reducing per-token compute to 1/20.
- Coding & Cowork Capability: M3 achieves frontier-level performance across long-horizon agentic benchmarks, excelling in both coding and cowork.
MiniMax Sparse Attention (MSA)
M3 is powered by MiniMax Sparse Attention (MSA), a high-performance sparse attention operator designed for million-token contexts. Compared with GQA, MSA dramatically reduces the attention compute and memory footprint while preserving model quality.
📄 Read the technical report: arXiv:2606.13392 · Hugging Face Papers
How to Use
M3 supports three reasoning modes through the thinking parameter:
enabled— Reasoning is always enabled.adaptive— M3 automatically determines when additional reasoning is beneficial.disabled— Reasoning is disabled to minimize latency and maximize throughput.
Local Deployment
Download the model:
hf download MiniMaxAI/MiniMax-M3 --local-dir MiniMax-M3
We recommend the following inference frameworks (listed alphabetically) to serve the model:
SGLang - see SGLang cookbook.
vLLM - see vLLM recipes.
Transformers - see Transformers docs.
Inference Parameters
We recommend the following parameters for best performance: temperature=1.0, top_p=0.95, top_k=40.
Contact Us
Contact us at model@minimax.io.
- Downloads last month
- -
Model tree for llmfan46/MiniMax-M3-uncensored-heretic-balanced
Base model
MiniMaxAI/MiniMax-M3