Instructions to use sahilchachra/minicpm5-1b-8bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilchachra/minicpm5-1b-8bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir minicpm5-1b-8bit-mlx sahilchachra/minicpm5-1b-8bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
minicpm5-1b-8bit-mlx
MLX quantization of openbmb/MiniCPM5-1B for Apple Silicon.
Variant: Affine int8
Disk size: 1105 MB
Quantized by: sahilchachra
Benchmark results
Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.
Performance
| This model | FP16 baseline | |
|---|---|---|
| Decode tok/s (steady-state) | 243.83 | 144.04 |
| Prefill tok/s (steady-state) | 1297.85 | 1005.67 |
| Decode tok/s (avg, long traces) | 87.97 | 143.39 |
| Prefill tok/s (avg, long traces) | 1587.38 | 3026.98 |
| Peak memory (GB) | 1.528 | 2.537 |
| Disk size (MB) | 1105 | 2071 |
Warmed, short-prompt, chat-templated, thinking disabled. Represents steady-state decode for typical chat use; long thinking traces will be slower due to KV-cache growth.
Quality
| Benchmark | This model | FP16 baseline | n |
|---|---|---|---|
| MATH-500 (math reasoning) | 60.0% (answered 22/30) | 70.0% (answered 24/30) | 30 |
| IFEval (instruction following) | 70.5% | 72.7% | 44 |
| HumanEval (code, pass@1) | 83.3% | 83.3% | 30 |
MATH-500 per-level accuracy
| Level | This model | FP16 baseline |
|---|---|---|
| level 1 | 83.3% | 83.3% |
| level 2 | 83.3% | 83.3% |
| level 3 | 33.3% | 50.0% |
| level 4 | 66.7% | 66.7% |
| level 5 | 33.3% | 66.7% |
Context scaling (decode tok/s)
| Context length | Decode tok/s |
|---|---|
| ~128 tokens | 62.6 |
| ~256 tokens | 59.6 |
| ~512 tokens | 61.1 |
| ~1024 tokens | 60.9 |
Usage
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("sahilchachra/minicpm5-1b-8bit-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)
All variants in this collection
| Model | Variant |
|---|---|
| sahilchachra/minicpm5-1b-8bit-mlx | Affine int8 ← this model |
Notes
- Requires Apple Silicon (M1 or later) with MLX
- Benchmarks run on Apple M5 Pro, 24 GB unified memory
- License: see openbmb/MiniCPM5-1B for the original model's license
Original model
See openbmb/MiniCPM5-1B for full model details and intended use.
- Downloads last month
- 278
Model size
0.3B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for sahilchachra/minicpm5-1b-8bit-mlx
Base model
openbmb/MiniCPM5-1B