Instructions to use bkideas/Qwen2.5-Coder-3B-MLX-nvfp4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use bkideas/Qwen2.5-Coder-3B-MLX-nvfp4 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("bkideas/Qwen2.5-Coder-3B-MLX-nvfp4") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use bkideas/Qwen2.5-Coder-3B-MLX-nvfp4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "bkideas/Qwen2.5-Coder-3B-MLX-nvfp4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "bkideas/Qwen2.5-Coder-3B-MLX-nvfp4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use bkideas/Qwen2.5-Coder-3B-MLX-nvfp4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "bkideas/Qwen2.5-Coder-3B-MLX-nvfp4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default bkideas/Qwen2.5-Coder-3B-MLX-nvfp4
Run Hermes
hermes
- MLX LM
How to use bkideas/Qwen2.5-Coder-3B-MLX-nvfp4 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "bkideas/Qwen2.5-Coder-3B-MLX-nvfp4"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "bkideas/Qwen2.5-Coder-3B-MLX-nvfp4" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bkideas/Qwen2.5-Coder-3B-MLX-nvfp4", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwen2.5-Coder-3B-MLX-nvfp4
This repository contains the 4-bit NVFP4 quantized weights for Qwen/Qwen2.5-Coder-3B, optimized for low-latency inference on Apple Silicon using the oMLX framework.
Qwen2.5-Coder-3B is the ultra-lightweight entry in the Qwen2.5 coding specialist series. Despite its exceptionally compact 3 billion parameter footprint, it inherits the advanced architectural and training enhancements of the broader Qwen2.5-Coder family, making it uniquely suited for fast, edge-based autocomplete, inline code generation, and low-resource deployments.
🚀 Efficiency & Performance Advantages
By combining the highly efficient 3B parameter base model with a 4-bit NVFP4 quantization mapping, this variant achieves:
- ⚡ Blazing-Fast Generation (TPS): Exceptional token generation and prefill speeds, allowing for near-instantaneous IDE code completions.
- 📉 Minimal Memory Footprint: Extremely small VRAM utilization, freeing up system resources to comfortably run alongside heavy local developer environments.
- ⚙️ Seamless Mac Optimization: Native acceleration when coupled with modern execution layers like oMLX on Apple Silicon.
🛠️ Deployment & Execution Quickstart
To utilize this model on macOS, ensure you are running an inference wrapper configured to handle nvfp4 metadata structures.
Running with oMLX
Execute local evaluation benches natively via terminal: omlx bench --model your-hf-username/Qwen2.5-Coder-3B-MLX-nvfp4 --prompt "Write a Python function to clear a list."
- Downloads last month
- 158
4-bit