Instructions to use jedisct1/Mellum2-12B-A2.5B-Instruct-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use jedisct1/Mellum2-12B-A2.5B-Instruct-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("jedisct1/Mellum2-12B-A2.5B-Instruct-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use jedisct1/Mellum2-12B-A2.5B-Instruct-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jedisct1/Mellum2-12B-A2.5B-Instruct-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jedisct1/Mellum2-12B-A2.5B-Instruct-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jedisct1/Mellum2-12B-A2.5B-Instruct-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "jedisct1/Mellum2-12B-A2.5B-Instruct-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jedisct1/Mellum2-12B-A2.5B-Instruct-mlx
Run Hermes
hermes
- MLX LM
How to use jedisct1/Mellum2-12B-A2.5B-Instruct-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "jedisct1/Mellum2-12B-A2.5B-Instruct-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "jedisct1/Mellum2-12B-A2.5B-Instruct-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jedisct1/Mellum2-12B-A2.5B-Instruct-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
Mellum2-12B-A2.5B-Instruct-mlx
This is an MLX version of
JetBrains/Mellum2-12B-A2.5B-Instruct,
the instruction-tuned Mixture-of-Experts coding assistant from JetBrains. The weights are kept in
their native bfloat16 precision, so the model behaves exactly like the original checkpoint.
Unlike its sibling Mellum2-12B-A2.5B-Thinking,
the Instruct model answers directly without emitting a <think> reasoning block, which makes it
faster and lighter on tokens for straightforward coding and tool-use tasks.
Mellum 2 uses 64 experts with 8 active per token (about 2.5B active parameters out of 12B), a mix of sliding-window and full-attention layers, and a 131,072-token context window.
Tool calling was verified end to end against a live mlx_lm.server driven by the swival agent
harness: across repeated runs the model issued well-formed read_file, edit_file,
write_file, list_files, and shell-command calls and never produced a malformed tool call.
Generation stops cleanly on <|im_end|> (the eos_token_id is set to [0, 28], which is what
lets agent harnesses see a proper tool_calls finish reason — the upstream checkpoint ships
eos_token_id: 0, which never fires on a chat turn and leaves tool calls running past the token
limit).
Quantizations
If you want the same model with a smaller footprint:
Mellum2-12B-A2.5B-Instruct-mlx-8bit— 8-bit, effectively indistinguishable from this modelMellum2-12B-A2.5B-Instruct-mlx-4bit— 4-bit, tuned to keep tool calling reliable
Requirements
The mellum architecture is not supported by the stock mlx-lm code yet.
Until it is supported upstream, install this fork of mlx-lm from source:
pip install git+https://github.com/jedisct1/mlx-lm
Or run it directly with uv:
uvx --from git+https://github.com/jedisct1/mlx-lm mlx_lm.server
Use with mlx-lm
Quick test:
uvx --from git+https://github.com/jedisct1/mlx-lm \
mlx_lm.generate --model jedisct1/Mellum2-12B-A2.5B-Instruct-mlx \
--prompt "Write a Python function that reverses a linked list." \
--max-tokens 16384 \
--temp 0.6 --top-p 0.95 --top-k 20
Starting the server:
uvx --from git+https://github.com/jedisct1/mlx-lm \
mlx_lm.server --model jedisct1/Mellum2-12B-A2.5B-Instruct-mlx \
--max-tokens 16384 \
--temp 0.6 --top-p 0.95 --top-k 20
The recommended sampling settings from JetBrains are temperature=0.6, top_p=0.95, top_k=20.
Using this setup with the Swival.dev harness
Install swival.dev:
uv tool install swival
Then point it at the running server:
swival --provider llamacpp --model jedisct1/Mellum2-12B-A2.5B-Instruct-mlx
License
Apache 2.0, inherited from the original model.
- Downloads last month
- 322
Quantized
Model tree for jedisct1/Mellum2-12B-A2.5B-Instruct-mlx
Base model
JetBrains/Mellum2-12B-A2.5B-Instruct