Instructions to use samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed") config = load_config("samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
Run Hermes
hermes
Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
This is an MLX 4-bit build of Qwen/Qwen3.6-35B-A3B packaged for fast local serving with lightning-mlx.
The model includes an MTPLX sidecar (mtp.safetensors) and runtime metadata (mtplx_runtime.json) so lightning-mlx can use its Qwen3.6 MTPLX serving path on Apple Silicon. The included runtime metadata was verified on Darwin arm64 with mtplx_version: 0.1.0rc3, mtp_depth_max: 1, and the performance-cold recommended profile.
Refer to the original Qwen3.6-35B-A3B model card for base-model capabilities, license, and upstream details.
Install lightning-mlx
Install directly from GitHub:
python3 -m pip install git+https://github.com/samuelfaj/lightning-mlx.git
Or use the self-contained installer:
curl -fsSL https://raw.githubusercontent.com/samuelfaj/lightning-mlx/main/install.sh | bash
Verify the CLI:
lightning-mlx --help
Serve this model
Serve directly from Hugging Face:
lightning-mlx serve samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
Or serve from a local checkout:
lightning-mlx serve /path/to/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
For long-running local use, start it as a daemon:
lightning-mlx serve samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed --daemon
Daemon mode starts a detached supervisor, writes logs under ~/.lightning-mlx/logs/, and can restart the server if the model process exits unexpectedly.
Useful daemon commands:
lightning-mlx status
lightning-mlx tui <PID-or-model-name>
lightning-mlx kill <PID-or-model-name>
Use status to list running daemons, tui to attach the live monitor, and kill to stop by supervisor PID, server PID, alias, or model name.
Use the OpenAI-compatible API
Once the server is running, send chat requests to the local OpenAI-compatible endpoint:
curl http://localhost:8010/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local",
"messages": [
{"role": "user", "content": "Write a tiny Python HTTP server."}
],
"stream": true
}'
The default served model name used by lightning-mlx is local, so OpenAI-compatible clients can point at the local base URL and keep "model": "local" unless you override the served model name.
Why use lightning-mlx
lightning-mlx is built for local agent workloads on Apple Silicon: short streamed turns, tool calls, growing context, and repeated low-latency interactions. With this model it can use the packaged MTPLX metadata and Qwen3.6 serving preset instead of treating the checkpoint as a generic MLX model.
The runtime focuses on:
- OpenAI-compatible local serving
- Fast streamed chat completions
- Qwen3.6 reasoning and tool-use paths
- MTPLX-style speculative decoding support
- Daemon, status, TUI, and kill controls for local model servers
Convert similar local MTPLX models
If you have a local quantized Qwen3.6 model and the original full model for MTP tensors, lightning-mlx can package a similar MTPLX model:
lightning-mlx convert-mtplx \
/path/to/Qwen3.6-35B-A3B-4bit \
--mtp-source /path/to/Qwen3.6-35B-A3B
By default, the output is written next to the source model as:
/path/to/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
Then serve it normally:
lightning-mlx serve /path/to/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
Use with mlx-vlm
This checkpoint remains an MLX model. For direct generation through mlx-vlm:
pip install -U mlx-vlm
python -m mlx_vlm.generate \
--model samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed \
--max-tokens 100 \
--temperature 0.0 \
--prompt "Describe this image." \
--image <path_to_image>
- Downloads last month
- 3,728
4-bit
Model tree for samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed
Base model
Qwen/Qwen3.6-35B-A3B