Qwen2.5-7B-Instruct MLX 4-bit (group size 32)

MLX 4-bit conversion of Qwen/Qwen2.5-7B-Instruct with --q-group-size 32.

Group size = 32. Smaller groups store more scales = better quality + slightly larger file. Most published MLX repos default to gs=64 silently — this one discloses the value.

The full ladder + group-size sweep

Variant Repo Disk ~Min unified RAM Role
MLX bf16 Qwen2.5-7B-Instruct-MLX-bf16 15.24 GB ~18 GB Reference
MLX 8bit Qwen2.5-7B-Instruct-MLX-8bit 8.1 GB ~10 GB Near-lossless
MLX 6bit Qwen2.5-7B-Instruct-MLX-6bit 6.2 GB ~8 GB Quality / size middle
MLX 4bit-gs32 (this repo) this 4.77 GB ~7 GB 4-bit, group size 32
MLX 4bit-gs64 Qwen2.5-7B-Instruct-MLX-4bit-gs64 4.3 GB ~6 GB 4-bit, group size 64 (mlx-lm default)
MLX 4bit-gs128 Qwen2.5-7B-Instruct-MLX-4bit-gs128 4.06 GB ~6 GB 4-bit, group size 128
MLX 3bit Qwen2.5-7B-Instruct-MLX-3bit 3.34 GB ~5 GB Smaller, expect quality drop
MLX 2bit Qwen2.5-7B-Instruct-MLX-2bit 2.39 GB ~4 GB Aggressive — verify on workload

Collection: Qwen2.5-7B-Instruct MLX ladder + group-size sweep

Use

pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-4bit-gs32 \
  --prompt "Explain quantum entanglement in one paragraph" --max-tokens 200

Conversion

python -m mlx_lm convert \
  --hf-path Qwen/Qwen2.5-7B-Instruct \
  --mlx-path ./Qwen2.5-7B-Instruct-MLX-4bit-gs32 \
  -q --q-bits 4 --q-group-size 32

Notes

  • Part of a complete mlx-lm group-size sweep at 4-bit (gs=32, 64, 128) on this base model — every gs value mlx-lm supports.
  • See the sibling repos for other bit budgets / group sizes.

Credits

Downloads last month
103
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zaydiscold/Qwen2.5-7B-Instruct-MLX-4bit-gs32

Base model

Qwen/Qwen2.5-7B
Quantized
(313)
this model

Collection including zaydiscold/Qwen2.5-7B-Instruct-MLX-4bit-gs32