MiMo-V2.5-oQ4-MLX

This repository contains an oMLX oQ4 mixed-precision MLX quantization of XiaomiMiMo/MiMo-V2.5.

MiMo-V2.5 is an omnimodal sparse Mixture-of-Experts model from Xiaomi MiMo. The upstream model card describes it as a 310B total / 15B activated parameter model with a 1M context window and support for text, image, video, and audio inputs.

Quantization

Field	Value
Method	oMLX oQ mixed-precision MLX
Quantization	oQ4
Base model revision	`2fd4f899a491de2fb0beeafe32b5d700b251f593`
oMLX version	`0.4.1`
Model type	`mimo_v2_flash`
Group size	64
Quantization mode	`affine`
Base bits	4
Sensitivity map	position heuristic fallback
Output shards	30 safetensors
Output size	167.1 GiB
Non-quantized/scales dtype	bfloat16
Copied extra assets	audio_tokenizer present
MTP weights preserved	72 tensors
MTP layers	3

Notes

This artifact is prepared for MLX/oMLX runtimes. The upstream checkpoint uses FP8 storage; during conversion oMLX dequantizes FP8 tensors on the fly and writes MLX quantized safetensors.

The local MLX model type is normalized to mimo_v2_flash so the bundled oMLX runtime can resolve the MiMo-V2 family model implementation.

The installed oMLX automatic proxy sensitivity path could not strict-load the MiMo-V2.5 multimodal checkpoint, so this conversion uses the same layer-position heuristic sensitivity map that oMLX uses for size estimation.

MiMo's model.mtp.* tensors are preserved in this artifact. As of the bundled oMLX 0.4.1 runtime used for this conversion, Native MTP dispatch is not wired for mimo_v2_flash; MTP tensors are preserved for future runtime support.

This is an unofficial quantized derivative. It is not affiliated with, sponsored by, or endorsed by Xiaomi.

Validation

Artifact validation completed locally with the bundled oMLX runtime on macOS:

source model: XiaomiMiMo/MiMo-V2.5
source revision: 2fd4f899a491de2fb0beeafe32b5d700b251f593
quantization: oQ4
config.json: present
model.safetensors.index.json: present
safetensor shards: 30
output size: 167.1 GiB
audio_tokenizer assets: present
mtp tensors: 72 preserved

Generation smoke testing is intentionally not claimed here because MiMo-V2.5 is a very large omnimodal/MoE checkpoint and runtime support depends on the local MLX/oMLX build and available unified memory.

Usage

Use an MLX/oMLX build that supports MiMo-V2.5 omnimodal inputs and the packaged MiMo-V2 model implementation.

huggingface-cli download \
  --local-dir MiMo-V2.5-oQ4-MLX \
  dawncr0w/MiMo-V2.5-oQ4-MLX

For a text-only smoke test, adapt the command to your local MLX/oMLX runtime:

python -m mlx_lm generate \
  --model /path/to/MiMo-V2.5-oQ4-MLX \
  --prompt "Hello" \
  --max-tokens 32 \
  --temp 0

For multimodal inference, use an oMLX/MLX runtime that supports MiMo-V2.5 omnimodal inputs and pass this directory as the local checkpoint.

License And Notice

The base model is distributed under the MIT License. This quantized artifact follows the same license. Please also review the upstream model card for usage notes and limitations.

Downloads last month: 52

Safetensors

Model size

50B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for dawncr0w/MiMo-V2.5-oQ4-MLX

Base model

XiaomiMiMo/MiMo-V2.5

Quantized

(24)

this model

Collection including dawncr0w/MiMo-V2.5-oQ4-MLX

MiMo V2.5 MLX Models

Collection

Public MiMo V2.5 MLX model repositories. • 1 item • Updated 2 days ago