MiMo-V2.5-NE128-72-OP

This is an offline seed-pruned version of XiaomiMiMo/MiMo-V2.5, produced with Akicou/ream.

It is not an official Xiaomi release.

Name decoding

MiMo-V2.5-NE128-72-OP means:

  • NE128: created with --n-experts 128
  • 72: seed 72
  • OP: offline pruning

Important: in this pruning script, --n-experts 128 means 128 experts were removed from each MoE layer. The resulting model keeps 128 / 256 routed experts per MoE layer.

What was changed

The base MiMo-V2.5 checkpoint was pruned directly at the safetensors level without loading the full Transformers model into memory and without GPU calibration.

Pruning details:

Item Value
Base model XiaomiMiMo/MiMo-V2.5
Method Offline random seed expert pruning
Command flag --n-experts 128
Seed 72
Original routed experts per MoE layer 256
Experts removed per MoE layer 128
Routed experts retained per MoE layer 128
MoE layers processed 47
Dense layer Layer 0 left unchanged
Config value n_routed_experts: 128
Output model safetensors payload ~`163.50 GB`

Only routed MoE experts and their router tensors were pruned/remapped. Non-MoE weights, tokenizer files, multimodal/audio files, and model code were copied from the base model.

How it was created

python examples/compress_model.py \
  --model XiaomiMiMo/MiMo-V2.5 \
  --output ./MiMo-V2.5-NE128-72-OP \
  --offline-seed-prune \
  --n-experts 128 \
  --seed 72

Important notes

  • This is random seed pruning, not calibrated saliency pruning.
  • No benchmark evaluation is claimed here.
  • Quality may be significantly worse than the original model.
  • The model may require a recent torch/transformers stack because the original MiMo-V2.5 code uses FP8/custom MoE integrations.
  • Use trust_remote_code=True when loading the model.

Basic text usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Akicou/MiMo-V2.5-NE128-72-OP"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

prompt = "What is a reaper?"
inputs = tokenizer(prompt, return_tensors="pt").to(next(model.parameters()).device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=128, do_sample=False)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Attribution

All architecture, tokenizer, multimodal/audio components, and original weights are from XiaomiMiMo/MiMo-V2.5. This repository only contains an offline-pruned derivative checkpoint.

Downloads last month
7
Safetensors
Model size
159B params
Tensor type
F32
BF16
F8_E4M3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Akicou/MiMo-V2.5-NE128-72-OP

Quantized
(27)
this model