Instructions to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality
Run Hermes
hermes
- MLX LM
How to use nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwopus3.6-27B-v2 · MTPLX 8-bit Quality
The first MTPLX build of Qwopus3.6-27B-v2 — native multi-token-prediction speculative decoding on Apple Silicon, no external drafter, exact rejection sampling (Leviathan–Chen with residual correction), so sampling behaves exactly like normal decoding, just faster.
Forged with mtplx forge build from the original BF16 Jackrong/Qwopus3.6-27B-v2:
- Body: flat 8-bit MLX affine quantization, group size 64
- MTP head: preserved in BF16 (
mtp_policy: keep_bf16), packed asmtp.safetensorssidecar - Size: 29.4 GB
- Calibrated MTP contract included (
mtplx_runtime.json)
Measured performance (Apple M5 Max, 128 GB, MTPLX 1.0.3)
| Mode | Decode | Acceptance by depth |
|---|---|---|
| Plain autoregressive | ~17.5 tok/s | — |
| MTP depth 3 | 39.0 tok/s (2.2×) | 94% / 86% / 74% |
Verification suite: long-code-uncapped, 2048-token budget. For reference, the same machine runs Qwopus-v2 Q6_K on llama.cpp with MTP (n=2) at ~24–26 tok/s — this build is ~55% faster at higher body precision.
Usage
brew install youssofal/mtplx/mtplx # or pipx install mtplx
mtplx pull nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality
mtplx quickstart --model nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality \
--depth 3 --paged-kv-quantization q8 --batching-preset agent --reasoning off
Serves OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages) endpoints with warm-prefix KV reuse, SSD session cache, continuous batching, and vision support. Full 262144-token context; only 16 of 64 layers carry KV (hybrid Gated DeltaNet architecture), so KV at 256K is ~16 GiB BF16 / ~8 GiB q8.
Notes
- Runtime contract tier is
forge-local: verified on the forging machine (M5 Max). MTPLX loads it with an honest provenance note. - Thinking/reasoning can be left on;
--reasoning offis recommended for terse agentic/coding use. - Quantized with MTPLX 1.0.3 Forge. All credit for the fine-tune to Jackrong; base model Qwen3.6-27B (Apache-2.0).
- Downloads last month
- 123
8-bit
Model tree for nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality
Base model
Jackrong/Qwopus3.6-27B-v2