Instructions to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality

Run Hermes

hermes

MLX LM

How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwopus3.6-27B-v2 · MTPLX 8-bit Quality

The first MTPLX build of Qwopus3.6-27B-v2 — native multi-token-prediction speculative decoding on Apple Silicon, no external drafter, exact rejection sampling (Leviathan–Chen with residual correction), so sampling behaves exactly like normal decoding, just faster.

Forged with mtplx forge build from the original BF16 Jackrong/Qwopus3.6-27B-v2:

Body: flat 8-bit MLX affine quantization, group size 64
MTP head: preserved in BF16 (mtp_policy: keep_bf16), packed as mtp.safetensors sidecar
Size: 29.4 GB
Calibrated MTP contract included (mtplx_runtime.json)

Measured performance (Apple M5 Max, 128 GB, MTPLX 1.0.3)

Mode	Decode	Acceptance by depth
Plain autoregressive	~17.5 tok/s	—
MTP depth 3	39.0 tok/s (2.2×)	94% / 86% / 74%

Verification suite: long-code-uncapped, 2048-token budget. For reference, the same machine runs Qwopus-v2 Q6_K on llama.cpp with MTP (n=2) at ~24–26 tok/s — this build is ~55% faster at higher body precision.

Usage

brew install youssofal/mtplx/mtplx   # or pipx install mtplx
mtplx pull nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality
mtplx quickstart --model nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality \
  --depth 3 --paged-kv-quantization q8 --batching-preset agent --reasoning off

Serves OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages) endpoints with warm-prefix KV reuse, SSD session cache, continuous batching, and vision support. Full 262144-token context; only 16 of 64 layers carry KV (hybrid Gated DeltaNet architecture), so KV at 256K is ~16 GiB BF16 / ~8 GiB q8.

Notes

Runtime contract tier is forge-local: verified on the forging machine (M5 Max). MTPLX loads it with an honest provenance note.
Thinking/reasoning can be left on; --reasoning off is recommended for terse agentic/coding use.
Quantized with MTPLX 1.0.3 Forge. All credit for the fine-tune to Jackrong; base model Qwen3.6-27B (Apache-2.0).

Downloads last month: 123

Safetensors

Model size

27B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality

Base model

Jackrong/Qwopus3.6-27B-v2

Quantized

(55)

this model