Instructions to use zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen2.5-7B-Instruct-MLX-2bit zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen2.5-7B-Instruct-MLX-2bit
MLX 2-bit conversion of Qwen/Qwen2.5-7B-Instruct. Converted directly from the original HF bf16 safetensors. Not from GGUF. Not chained from another quant. No double-quant hop.
Group size: 64. Smaller groups store more scales = better quality, slightly larger file. Most published MLX repos use group-size 64 silently — this repo discloses it.
Apple Silicon only. GGUF Q4_K_M is a llama.cpp quant — MLX has no literal Q4_K_M mode. Don't conflate them.
2-bit warning
This build is experimental. It loads and runs on an M1 16GB host, but the first smoke sweep produced incoherent text on simple prompts. Use 3-bit or 4-bit-gs64 for actual local use until stronger evals say otherwise.
Use
pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit \\
--prompt "Explain quantum entanglement in one paragraph" --max-tokens 200
Conversion
python -m mlx_lm convert \
--hf-path Qwen/Qwen2.5-7B-Instruct \
--mlx-path ./Qwen2.5-7B-Instruct-MLX-2bit \
-q --q-bits 2 --q-group-size 64
Credits
- Source: Qwen/Qwen2.5-7B-Instruct
- MLX conversion: zaydiscold
Part of a Qwen2.5-7B-Instruct MLX quant ladder + group-size perplexity sweep. See the sibling repos under zaydiscold for other bit levels and group sizes — perplexity numbers are coming as a separate dataset repo.
- Downloads last month
- 89
2-bit