Instructions to use divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx
Run Hermes
hermes
- MLX LM
How to use divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
Llama-3.3-70B-Instruct-abliterated 8-bit MLX
An 8-bit MLX quantization of huihui-ai/Llama-3.3-70B-Instruct-abliterated, packaged for fast local inference on Apple Silicon.
8-bit was chosen instead of the more common 4-bit so the quant preserves as much of the base model's quality as possible — meant for users who have the unified-memory headroom and care more about output fidelity than minimum footprint.
Model details
| Field | Value |
|---|---|
| Base model | huihui-ai/Llama-3.3-70B-Instruct-abliterated |
| Quantization | 8-bit affine, group size 64 |
| Format | MLX (safetensors, 15 shards) |
| Architecture | Llama 3.3 — 80 layers, 8192 hidden, 64 attention heads / 8 KV heads, 128k context |
| Disk size | ~75 GB |
| Converted with | mlx-lm 0.31.2 |
Hardware requirements
You need an Apple Silicon Mac with enough unified memory to hold the full model in RAM, plus headroom for the KV cache and your OS:
- Will not run on 64 GB Macs — use a 4-bit quant instead.
- Minimum: ~80 GB free unified memory (so a 96 GB Ultra or a 128 GB Max).
- Comfortable: 128 GB or more, especially if you want long contexts.
Usage
Install mlx-lm:
pip install mlx-lm
Generate from the command line:
mlx_lm.generate \
--model divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx \
--prompt "Write a short poem about Apple Silicon." \
--max-tokens 200
Or load it from Python:
from mlx_lm import load, generate
model, tokenizer = load("divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx")
messages = [{"role": "user", "content": "Hello, who are you?"}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=False
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)
Credits
- Base abliteration by huihui-ai, built on Meta's
meta-llama/Llama-3.3-70B-Instruct. - MLX 8-bit conversion by divinetribe using
mlx-lm. - For background on the abliteration technique, see Maxime Labonne's write-up on Hugging Face.
License
Inherits the Llama 3.3 Community License from the upstream model. Use of this quant is bound by the same terms and Acceptable Use Policy — accept the gated-access prompt above to access the weights.
Notes
"Abliterated" means the model's built-in refusal direction has been suppressed so it doesn't refuse benign-but-edgy requests. It is not a general capability upgrade — please use it responsibly and within the bounds of the upstream license.
About the author
This model was built by Matt Macosko (@nicedreamzapp) for the claude-code-local stack — run Claude Code 100% on-device with local AI on Apple Silicon (⭐ 2,664 on GitHub).
- 🤗 All my models: nicedreamzwholesale.com/software/huggingface/
- 💻 Software portfolio: nicedreamzwholesale.com/software/
- 🔒 AirGap AI (legal / healthcare / NDA workflows): nicedreamzwholesale.com/airgap/
- Downloads last month
- 1,235
8-bit
Model tree for divinetribe/Llama-3.3-70B-Instruct-abliterated-8bit-mlx
Base model
meta-llama/Llama-3.1-70B