MiMo-V2.5-oQ4-MLX

This repository contains an oMLX oQ4 mixed-precision MLX quantization of XiaomiMiMo/MiMo-V2.5.

MiMo-V2.5 is an omnimodal sparse Mixture-of-Experts model from Xiaomi MiMo. The upstream model card describes it as a 310B total / 15B activated parameter model with a 1M context window and support for text, image, video, and audio inputs.

Quantization

Field Value
Method oMLX oQ mixed-precision MLX
Quantization oQ4
Base model revision 2fd4f899a491de2fb0beeafe32b5d700b251f593
oMLX version 0.4.1
Model type mimo_v2_flash
Group size 64
Quantization mode affine
Base bits 4
Sensitivity map position heuristic fallback
Output shards 30 safetensors
Output size 167.1 GiB
Non-quantized/scales dtype bfloat16
Copied extra assets audio_tokenizer present
MTP weights preserved 72 tensors
MTP layers 3

Notes

This artifact is prepared for MLX/oMLX runtimes. The upstream checkpoint uses FP8 storage; during conversion oMLX dequantizes FP8 tensors on the fly and writes MLX quantized safetensors.

The local MLX model type is normalized to mimo_v2_flash so the bundled oMLX runtime can resolve the MiMo-V2 family model implementation.

The installed oMLX automatic proxy sensitivity path could not strict-load the MiMo-V2.5 multimodal checkpoint, so this conversion uses the same layer-position heuristic sensitivity map that oMLX uses for size estimation.

MiMo's model.mtp.* tensors are preserved in this artifact. As of the bundled oMLX 0.4.1 runtime used for this conversion, Native MTP dispatch is not wired for mimo_v2_flash; MTP tensors are preserved for future runtime support.

This is an unofficial quantized derivative. It is not affiliated with, sponsored by, or endorsed by Xiaomi.

Validation

Artifact validation completed locally with the bundled oMLX runtime on macOS:

source model: XiaomiMiMo/MiMo-V2.5
source revision: 2fd4f899a491de2fb0beeafe32b5d700b251f593
quantization: oQ4
config.json: present
model.safetensors.index.json: present
safetensor shards: 30
output size: 167.1 GiB
audio_tokenizer assets: present
mtp tensors: 72 preserved

Generation smoke testing is intentionally not claimed here because MiMo-V2.5 is a very large omnimodal/MoE checkpoint and runtime support depends on the local MLX/oMLX build and available unified memory.

Usage

Use an MLX/oMLX build that supports MiMo-V2.5 omnimodal inputs and the packaged MiMo-V2 model implementation.

huggingface-cli download \
  --local-dir MiMo-V2.5-oQ4-MLX \
  dawncr0w/MiMo-V2.5-oQ4-MLX

For a text-only smoke test, adapt the command to your local MLX/oMLX runtime:

python -m mlx_lm generate \
  --model /path/to/MiMo-V2.5-oQ4-MLX \
  --prompt "Hello" \
  --max-tokens 32 \
  --temp 0

For multimodal inference, use an oMLX/MLX runtime that supports MiMo-V2.5 omnimodal inputs and pass this directory as the local checkpoint.

License And Notice

The base model is distributed under the MIT License. This quantized artifact follows the same license. Please also review the upstream model card for usage notes and limitations.

Downloads last month
52
Safetensors
Model size
50B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dawncr0w/MiMo-V2.5-oQ4-MLX

Quantized
(24)
this model

Collection including dawncr0w/MiMo-V2.5-oQ4-MLX