minicpm5-1b-8bit-mlx

MLX quantization of openbmb/MiniCPM5-1B for Apple Silicon.

Variant: Affine int8
Disk size: 1105 MB
Quantized by: sahilchachra

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Decode tok/s (steady-state)	243.83	144.04
Prefill tok/s (steady-state)	1297.85	1005.67
Decode tok/s (avg, long traces)	87.97	143.39
Prefill tok/s (avg, long traces)	1587.38	3026.98
Peak memory (GB)	1.528	2.537
Disk size (MB)	1105	2071

Warmed, short-prompt, chat-templated, thinking disabled. Represents steady-state decode for typical chat use; long thinking traces will be slower due to KV-cache growth.

Quality

Benchmark	This model	FP16 baseline	n
MATH-500 (math reasoning)	60.0% (answered 22/30)	70.0% (answered 24/30)	30
IFEval (instruction following)	70.5%	72.7%	44
HumanEval (code, pass@1)	83.3%	83.3%	30

MATH-500 per-level accuracy

Level	This model	FP16 baseline
level 1	83.3%	83.3%
level 2	83.3%	83.3%
level 3	33.3%	50.0%
level 4	66.7%	66.7%
level 5	33.3%	66.7%

Context scaling (decode tok/s)

Context length	Decode tok/s
~128 tokens	62.6
~256 tokens	59.6
~512 tokens	61.1
~1024 tokens	60.9

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/minicpm5-1b-8bit-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)

All variants in this collection

Model	Variant
sahilchachra/minicpm5-1b-8bit-mlx	Affine int8 ← this model

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M5 Pro, 24 GB unified memory
License: see openbmb/MiniCPM5-1B for the original model's license

Original model

See openbmb/MiniCPM5-1B for full model details and intended use.

Downloads last month: 278

Safetensors

Model size

0.3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/minicpm5-1b-8bit-mlx

Base model

openbmb/MiniCPM5-1B

Quantized

(32)

this model