Instructions to use dawncr0w/MiMo-V2.5-oQ4-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dawncr0w/MiMo-V2.5-oQ4-MLX with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("dawncr0w/MiMo-V2.5-oQ4-MLX") config = load_config("dawncr0w/MiMo-V2.5-oQ4-MLX") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use dawncr0w/MiMo-V2.5-oQ4-MLX with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dawncr0w/MiMo-V2.5-oQ4-MLX"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dawncr0w/MiMo-V2.5-oQ4-MLX" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dawncr0w/MiMo-V2.5-oQ4-MLX with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dawncr0w/MiMo-V2.5-oQ4-MLX"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dawncr0w/MiMo-V2.5-oQ4-MLX
Run Hermes
hermes
MiMo-V2.5-oQ4-MLX
This repository contains an oMLX oQ4 mixed-precision MLX quantization of
XiaomiMiMo/MiMo-V2.5.
MiMo-V2.5 is an omnimodal sparse Mixture-of-Experts model from Xiaomi MiMo. The upstream model card describes it as a 310B total / 15B activated parameter model with a 1M context window and support for text, image, video, and audio inputs.
Quantization
| Field | Value |
|---|---|
| Method | oMLX oQ mixed-precision MLX |
| Quantization | oQ4 |
| Base model revision | 2fd4f899a491de2fb0beeafe32b5d700b251f593 |
| oMLX version | 0.4.1 |
| Model type | mimo_v2_flash |
| Group size | 64 |
| Quantization mode | affine |
| Base bits | 4 |
| Sensitivity map | position heuristic fallback |
| Output shards | 30 safetensors |
| Output size | 167.1 GiB |
| Non-quantized/scales dtype | bfloat16 |
| Copied extra assets | audio_tokenizer present |
| MTP weights preserved | 72 tensors |
| MTP layers | 3 |
Notes
This artifact is prepared for MLX/oMLX runtimes. The upstream checkpoint uses FP8 storage; during conversion oMLX dequantizes FP8 tensors on the fly and writes MLX quantized safetensors.
The local MLX model type is normalized to mimo_v2_flash so the bundled oMLX
runtime can resolve the MiMo-V2 family model implementation.
The installed oMLX automatic proxy sensitivity path could not strict-load the MiMo-V2.5 multimodal checkpoint, so this conversion uses the same layer-position heuristic sensitivity map that oMLX uses for size estimation.
MiMo's model.mtp.* tensors are preserved in this artifact. As of the bundled
oMLX 0.4.1 runtime used for this conversion, Native MTP dispatch is not
wired for mimo_v2_flash; MTP tensors are preserved for future runtime support.
This is an unofficial quantized derivative. It is not affiliated with, sponsored by, or endorsed by Xiaomi.
Validation
Artifact validation completed locally with the bundled oMLX runtime on macOS:
source model: XiaomiMiMo/MiMo-V2.5
source revision: 2fd4f899a491de2fb0beeafe32b5d700b251f593
quantization: oQ4
config.json: present
model.safetensors.index.json: present
safetensor shards: 30
output size: 167.1 GiB
audio_tokenizer assets: present
mtp tensors: 72 preserved
Generation smoke testing is intentionally not claimed here because MiMo-V2.5 is a very large omnimodal/MoE checkpoint and runtime support depends on the local MLX/oMLX build and available unified memory.
Usage
Use an MLX/oMLX build that supports MiMo-V2.5 omnimodal inputs and the packaged MiMo-V2 model implementation.
huggingface-cli download \
--local-dir MiMo-V2.5-oQ4-MLX \
dawncr0w/MiMo-V2.5-oQ4-MLX
For a text-only smoke test, adapt the command to your local MLX/oMLX runtime:
python -m mlx_lm generate \
--model /path/to/MiMo-V2.5-oQ4-MLX \
--prompt "Hello" \
--max-tokens 32 \
--temp 0
For multimodal inference, use an oMLX/MLX runtime that supports MiMo-V2.5 omnimodal inputs and pass this directory as the local checkpoint.
License And Notice
The base model is distributed under the MIT License. This quantized artifact follows the same license. Please also review the upstream model card for usage notes and limitations.
- Downloads last month
- 52
4-bit
Model tree for dawncr0w/MiMo-V2.5-oQ4-MLX
Base model
XiaomiMiMo/MiMo-V2.5