Qwopus3.5-4B-Coder-MTP-oQ6

An oMLX oQ-quantized version of Qwopus3.5-4B-Coder-MTP optimized for efficient local inference on Apple Silicon devices.

About Qwopus3.5-4B-Coder

Qwopus3.5-4B-Coder is a compact coding and agent-oriented model built on the Qwen3.5 4B family.

The model is designed for:

  • Coding assistance
  • Agent workflows
  • Tool use
  • Debugging
  • Structured reasoning
  • Software engineering tasks
  • Local development environments

The training recipe combines reasoning-oriented techniques, agent trajectories, and coding-focused instruction tuning to improve stability and practical coding performance.

About This Quantization

This repository contains an oMLX oQ6 mixed-precision quantization of the original model.

Unlike traditional uniform quantization methods, oQ allocates precision dynamically according to layer sensitivity. Critical model components retain higher precision while less sensitive components are compressed more aggressively.

Benefits include:

  • Reduced memory consumption
  • Reduced storage requirements
  • Better quality retention than uniform low-bit quantization
  • Faster local inference
  • Improved efficiency on Apple Silicon hardware

Multi-Token Prediction (MTP)

This release preserves the model's Multi-Token Prediction (MTP) components.

MTP allows the model architecture to predict multiple future tokens internally, improving generation efficiency and helping maintain compatibility with runtimes and workflows that support MTP-enabled Qwen-family models.

Recommended Settings

For best results:

temp: 1.0
top_p: 0.95
top_k: 20
min_p: 0
rep_penalty: 1.05
presence_penalty: 1.5
enable_thinking: true

These settings provide a good balance between exploration, acceptance rate, and generation quality when paired with a Qwen3.5 target model. Consider using DFlash model for more accurate and faster response. https://huggingface.co/z-lab/Qwen3.5-4B-DFlash or https://huggingface.co/yugeshkarunamurthy/Qwen3.5-4b-Dflash-6bit-MLX

Intended Use

This model is suitable for:

  • Code generation
  • Code review
  • Debugging assistance
  • Agentic coding workflows
  • Terminal assistants
  • IDE integrations
  • Research and experimentation
  • Local AI development

Usage

MLX-LM

from mlx_lm import load, generate

model, tokenizer = load("path/to/model")

response = generate(
    model,
    tokenizer,
    prompt="Write a Python function that implements binary search.",
    max_tokens=512,
)

print(response)

Claude Code

This model works well as a local coding model for Claude Code workflows where fast iteration, code generation, debugging, and repository assistance are required.

Quantization Details

Item Value
Base Model Qwopus3.5-4B-Coder-MTP
Quantization Method oMLX oQ
Format MLX
MTP Preserved Yes
Architecture Qwen3.5 Family

Performance Notes

Performance depends on:

  • Context length
  • Runtime implementation
  • Hardware configuration
  • Quantization parameters
  • Prompt style

Users are encouraged to benchmark the model on their own workloads.

Limitations

This model inherits the strengths and limitations of the original Qwopus3.5-4B-Coder model.

Quantization may introduce:

  • Minor reductions in reasoning quality
  • Small changes in generation behavior
  • Reduced performance on certain edge-case tasks

Results will vary depending on hardware and inference settings.

Credits

Original Model

  • Jackrong — Qwopus3.5-4B-Coder-MTP

Quantization

  • oMLX
  • MLX Ecosystem

Citation

If you use the original model in research, please cite the original Qwopus model authors and repository.

Disclaimer

This repository contains a community-generated quantized checkpoint and is not an official release from the original model authors.

Please evaluate the model carefully before deploying it in production environments.

Downloads last month
151
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yugeshkarunamurthy/Qwopus3.5-4B-Coder-oQ6

Finetuned
Qwen/Qwen3.5-4B
Quantized
(3)
this model