Instructions to use Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx

Run Hermes

hermes

MLX LM

How to use Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Shiftedx/qwopus3.6-35b-a3b-coder-mxfp4-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

qwopus3.6-35b-a3b-coder-mxfp4-mlx

MLX MXFP4 MLX conversion of Jackrong/Qwopus3.6-35B-A3B-Coder, prepared by Shiftedx for Apple Silicon / MLX / LM Studio.

What Changed

Converted from the upstream safetensors checkpoint with the local streaming MLX pipeline.
Quantized primary linear weights with mxfp4 at group size 32.
Kept MoE router/gate modules in affine 8-bit group size 64 for compatibility.
Removed source MTP tensors and set MTP/next-token prediction layer counts to 0 for LM Studio compatibility.
Set tool_parser_type to qwen3_coder.
Patched the chat template so enable_thinking defaults to false when a runtime honors the template variable.
Removed vision tensors/config for a smaller language-only build.

Local Validation

Validated locally on June 29, 2026 with LM Studio server on port 8080, 32k context, parallel 1, GPU max.

Check	Result
LM Studio load	Passed; 17.18 GiB in LM Studio at 32k context.
Basic text completion	Passed; returned `2+2=4` and stopped.
Vision image smoke	Not applicable; this is the language-only build.

Note: LM Studio may still report hidden reasoning_tokens for this checkpoint even though the upstream model is intended for thinking-off use. Use adequate max_tokens for smoke tests.

LM Studio

After downloading in LM Studio, load the model key:

lms load qwopus3.6-35b-a3b-coder-mlx --context-length 32768 --parallel 1 --gpu max

Recommended profile defaults, matching the local Shiftedx 35B AgentWorld/Ornith profiles:

Preset/template: Qwen3 thinking-compatible Jinja template with <|im_end|> stop.
Context length: 200000 when memory allows; 32768 was used for local smoke validation.
Sampling: temperature 0.6, top-k 20, top-p 0.95, min-p enabled at 0.
Repeat penalty: unchecked/off, value 1.1 if enabled manually.
Load: parallel 1, GPU max.