Qwen2.5-7B-Instruct-MLX-bf16

Full non-quantized MLX bfloat16 conversion of Qwen/Qwen2.5-7B-Instruct. The clean reference build: HF bf16 in, MLX bf16 out — nothing chained, nothing converted from GGUF.

Apple Silicon only. GGUF Q4_K_M is a llama.cpp quant — MLX has no literal Q4_K_M mode. Don't conflate them.

Use

pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-bf16 \\
  --prompt "Explain quantum entanglement in one paragraph" --max-tokens 200

Conversion

python -m mlx_lm convert \
  --hf-path Qwen/Qwen2.5-7B-Instruct \
  --mlx-path ./Qwen2.5-7B-Instruct-MLX-bf16 \
  --dtype bfloat16

Credits

Source: Qwen/Qwen2.5-7B-Instruct
MLX conversion: zaydiscold

Part of a Qwen2.5-7B-Instruct MLX quant ladder + group-size perplexity sweep. See the sibling repos under zaydiscold for other bit levels and group sizes — perplexity numbers are coming as a separate dataset repo.

M1 16GB smoke note

This bf16 reference artifact uploaded cleanly, but generation smoke failed on the Flow.swiss M1 16GB host with process exit -6. Use a larger Apple Silicon machine for bf16 runtime.