minicpm5-1b-8bit-mlx

MLX quantization of openbmb/MiniCPM5-1B for Apple Silicon.

Variant: Affine int8
Disk size: 1105 MB
Quantized by: sahilchachra

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

This model FP16 baseline
Decode tok/s (steady-state) 243.83 144.04
Prefill tok/s (steady-state) 1297.85 1005.67
Decode tok/s (avg, long traces) 87.97 143.39
Prefill tok/s (avg, long traces) 1587.38 3026.98
Peak memory (GB) 1.528 2.537
Disk size (MB) 1105 2071

Warmed, short-prompt, chat-templated, thinking disabled. Represents steady-state decode for typical chat use; long thinking traces will be slower due to KV-cache growth.

Quality

Benchmark This model FP16 baseline n
MATH-500 (math reasoning) 60.0% (answered 22/30) 70.0% (answered 24/30) 30
IFEval (instruction following) 70.5% 72.7% 44
HumanEval (code, pass@1) 83.3% 83.3% 30

MATH-500 per-level accuracy

Level This model FP16 baseline
level 1 83.3% 83.3%
level 2 83.3% 83.3%
level 3 33.3% 50.0%
level 4 66.7% 66.7%
level 5 33.3% 66.7%

Context scaling (decode tok/s)

Context length Decode tok/s
~128 tokens 62.6
~256 tokens 59.6
~512 tokens 61.1
~1024 tokens 60.9

Usage

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/minicpm5-1b-8bit-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)

All variants in this collection

Model Variant
sahilchachra/minicpm5-1b-8bit-mlx Affine int8 ← this model

Notes

  • Requires Apple Silicon (M1 or later) with MLX
  • Benchmarks run on Apple M5 Pro, 24 GB unified memory
  • License: see openbmb/MiniCPM5-1B for the original model's license

Original model

See openbmb/MiniCPM5-1B for full model details and intended use.

Downloads last month
278
Safetensors
Model size
0.3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/minicpm5-1b-8bit-mlx

Quantized
(32)
this model