qwopus3.6-35b-a3b-coder-mxfp4-vision-mlx

MLX MXFP4 MLX conversion of Jackrong/Qwopus3.6-35B-A3B-Coder, prepared by Shiftedx for Apple Silicon / MLX / LM Studio.

What Changed

Converted from the upstream safetensors checkpoint with the local streaming MLX pipeline.
Quantized primary linear weights with mxfp4 at group size 32.
Kept MoE router/gate modules in affine 8-bit group size 64 for compatibility.
Removed source MTP tensors and set MTP/next-token prediction layer counts to 0 for LM Studio compatibility.
Set tool_parser_type to qwen3_coder.
Patched the chat template so enable_thinking defaults to false when a runtime honors the template variable.
Added grafted vision tensors from model.visual.* as vision_tower.*.

Local Validation

Validated locally on June 29, 2026 with LM Studio server on port 8080, 32k context, parallel 1, GPU max.

Check	Result
LM Studio load	Passed; 17.18 GiB in LM Studio at 32k context.
Basic text completion	Passed; returned `2+2=4` and stopped.
Vision image smoke	Experimental only: model loaded and stopped, but a simple shapes image was not fully reliable. MXFP4 misidentified the shapes/colors; MXFP8 identified shapes/colors but not left-to-right order.

Note: LM Studio may still report hidden reasoning_tokens for this checkpoint even though the upstream model is intended for thinking-off use. Use adequate max_tokens for smoke tests.

Vision Status

This variant includes a grafted Qwen vision tower from the source checkpoint. The tensor/key layout matches the working MLX Qwen3.5-MoE vision format, and vision_tower.patch_embed.proj.weight was transposed to MLX layout (1152, 2, 16, 16, 3).

Local LM Studio image smoke testing did not fully pass, so treat the vision path as experimental. The language path loads and answers normally.

LM Studio

After downloading in LM Studio, load the model key:

lms load qwopus3.6-35b-a3b-coder-vision-mlx --context-length 32768 --parallel 1 --gpu max

Recommended profile defaults, matching the local Shiftedx 35B AgentWorld/Ornith profiles:

Preset/template: Qwen3 thinking-compatible Jinja template with <|im_end|> stop.
Context length: 200000 when memory allows; 32768 was used for local smoke validation.
Sampling: temperature 0.6, top-k 20, top-p 0.95, min-p enabled at 0.
Repeat penalty: unchecked/off, value 1.1 if enabled manually.
Load: parallel 1, GPU max.