Osaurus-AI's picture
Add HumanEval pass@1=100% benchmark
33cc8fc verified
|
Raw
History Blame Contribute Delete
2.29 kB
---
license: other
base_model: MiniMaxAI/MiniMax-M3
tags: [mlx, jang, reap, awq, moe, code, multimodal, minimax-m3, osaurus, apple-silicon]
pipeline_tag: text-generation
---
<p align="center"><img src="./osaurus-banner.png" alt="Osaurus" width="680"></p>
<h1 align="center">MiniMax-M3-Coder-Small</h1>
<p align="center"><b>🦖 Osaurus Exclusive — a compact JANG-quantized MiniMax-M3 coder (coding · agentic · multimodal) for Apple Silicon.</b></p>
> ⚠️ **JANG-format model — runs on Osaurus.**
> This uses the **JANG** quantization format (mixed-precision affine + **AWQ** + **REAP** expert pruning) and loads through **Osaurus's native Swift runtime**. It will **NOT** load with `transformers`, `vLLM`, or generic MLX loaders.
## What is a JANG model?
**JANG** is a mixed-precision quantization + packing format — per-projection affine bit widths + **AWQ** activation-aware scaling + **REAP** expert pruning — described by a `jang_config.json`. Weights stay quantized in GPU memory. **Osaurus loads it through its native Swift JANG runtime** on Apple Silicon.
## Highlights
- **Smallest M3 coder — ~84 GB** (the compact Osaurus build).
- **REAP45:** keep **70/128** routed experts (45% pruned).
- **All-2-bit routed experts + AWQ** (gate/up 2-bit AWQ-scaled, down 2-bit); attention 8-bit, shared experts 6-bit, embeddings 6-bit, lm_head 8-bit, Lightning Indexer FP16.
- **Multimodal (vision) kept.**
- Calibration: Vera (agentic-coder) + GSM8K; "floor" recipe keeps the most-salient coding experts.
## Benchmarks
- **HumanEval: pass@1 = 100%** (82/82, scrambled-half adaptive eval, seed 42; 0 failures, 0 escalations).
- Despite 45% expert pruning + all-2-bit routed experts, coding accuracy holds at **100%** — the REAP45 keep-set is a subset of the larger M3-Coder builds' proven coding experts, so coding capability is preserved while the model shrinks to ~84 GB.
## Run it
Load it in **Osaurus** (Apple Silicon) — it runs on Osaurus's native Swift JANG runtime.
## Attribution
- Base model: **MiniMaxAI/MiniMax-M3** · Pruning: **REAP** (Cerebras, arXiv:2510.13999)
- **Vera calibration + testing: [@hornsman1](https://huggingface.co/hornsman1) (hornsan1 on GitHub)** · math calibration: GSM8K
- Quantization: **JANG** · Runtime & distribution: **Osaurus**