Introducing the 5 bit MLX version of Qwen 2.5 7B Instruct, a 7 billion parameter dense large language model. 5 bit MLX quantization allows for a mix between precision in more advanced questions or quick answers for on-demand requests. It averages ~9tok/s and 5.5GB of RAM usage on an Apple MacBook Pro (M1, 8GB of unified memory, 256GB of internal storage).

Downloads last month
95
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for piskle/Qwen2.5-7B-Instruct-MLX-5bit

Base model

Qwen/Qwen2.5-7B
Quantized
(345)
this model