No longer available on HF due to storage restrictions: archived here

See MiniMax-M2.7 in action: demonstration videos

Tested with an M3 Ultra 512 GiB using Inferencer app

  • Text inference: ~37.45 tokens/s @ 1000 tokens ~161 GiB (debug build)

Q6-INF uses the data-agnostic INF method tuned to yield maximum general accuracy within a 192 GiB memory budget

Quantization (bpw)PerplexityToken AccuracyMissed Divergence
Q4.51.2734392.40%24.73%
Q6-INF1.2031297.40%13.92%
Q6.51.2109396.85%11.74%
Q91.2031297.50%9.95%
Base1.20312100.0%0.000%
  • Perplexity: Measures the confidence for predicting base tokens (lower is better)
  • Token Accuracy: The percentage of correctly generated base tokens
  • Missed Divergence: Measures severity of misses; how much the token was missed by
Quantized with a modified version of MLX
For more details see our demonstration videos or visit MiniMax-M2.7.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inferencerlabs/MiniMax-M2.7-MLX-Q6-INF

Quantized
(114)
this model