Qwen2.5-7B-Instruct MLX 4-bit (group size 32)

MLX 4-bit conversion of Qwen/Qwen2.5-7B-Instruct with --q-group-size 32.

Group size = 32. Smaller groups store more scales = better quality + slightly larger file. Most published MLX repos default to gs=64 silently — this one discloses the value.

The full ladder + group-size sweep

Variant	Repo	Disk	~Min unified RAM	Role
MLX bf16	`Qwen2.5-7B-Instruct-MLX-bf16`	15.24 GB	~18 GB	Reference
MLX 8bit	`Qwen2.5-7B-Instruct-MLX-8bit`	8.1 GB	~10 GB	Near-lossless
MLX 6bit	`Qwen2.5-7B-Instruct-MLX-6bit`	6.2 GB	~8 GB	Quality / size middle
MLX 4bit-gs32 (this repo)	this	4.77 GB	~7 GB	4-bit, group size 32
MLX 4bit-gs64	`Qwen2.5-7B-Instruct-MLX-4bit-gs64`	4.3 GB	~6 GB	4-bit, group size 64 (mlx-lm default)
MLX 4bit-gs128	`Qwen2.5-7B-Instruct-MLX-4bit-gs128`	4.06 GB	~6 GB	4-bit, group size 128
MLX 3bit	`Qwen2.5-7B-Instruct-MLX-3bit`	3.34 GB	~5 GB	Smaller, expect quality drop
MLX 2bit	`Qwen2.5-7B-Instruct-MLX-2bit`	2.39 GB	~4 GB	Aggressive — verify on workload

Collection: Qwen2.5-7B-Instruct MLX ladder + group-size sweep

Use

pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-4bit-gs32 \
  --prompt "Explain quantum entanglement in one paragraph" --max-tokens 200

Conversion

python -m mlx_lm convert \
  --hf-path Qwen/Qwen2.5-7B-Instruct \
  --mlx-path ./Qwen2.5-7B-Instruct-MLX-4bit-gs32 \
  -q --q-bits 4 --q-group-size 32

Notes

Part of a complete mlx-lm group-size sweep at 4-bit (gs=32, 64, 128) on this base model — every gs value mlx-lm supports.
See the sibling repos for other bit budgets / group sizes.