This model is a quantized NVFP4 MLX variant of LiquidAI/LFM2.5‑8B‑A1B‑MLX‑bf16, created by LiquidAI. Original model licensed under the LiquidAI Model License.

NVFP4 MLX Quantization — Performance & Quality

This model is a 4‑bit NVFP4 MLX‑quantized variant of the original BF16 LFM2.5‑8B‑A1B model. NVFP4 is MLX’s optimized 4‑bit format designed for efficient inference on Apple Silicon GPUs.

Why NVFP4?

NVFP4 reduces memory usage by ~65% and increases generation speed by ~1.6–1.8× on M‑series chips, while preserving most of the model’s quality.

Performance Comparison (Representative MLX Benchmarks)

Metric BF16 NVFP4 Notes
Memory usage ~15 GB ~5 GB Fits on 16 GB Macs
Token speed (M5 Max) ~41 tok/s ~72 tok/s ~1.75× faster
Perplexity 1.00× 1.02–1.03× ~2–3% degradation
Output quality Baseline ~95–98% identical Minor reasoning loss

Pros

  • Much lower memory footprint
  • Faster inference on macOS
  • Lower power usage
  • Ideal for laptops and smaller RAM configs

Cons

  • Slight quality degradation (1–3%)
  • Not suitable for fine‑tuning
  • Slightly more drift in very long generations

Practical Impact

For chat, summarization, and coding, NVFP4 behaves almost identically to the BF16 model.
For math/logic‑heavy tasks, BF16 remains slightly more accurate.

Benchmark table
Downloads last month
124
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bkideas/LFM2.5-8B-A1B-MLX-nvfp4

Quantized
(1)
this model