Support this work → · X · GitHub · REAP paper · Cerebras REAP

Qwen3.5-76B

REAP-pruned Qwen/Qwen3.5-122B-A10B.

At a glance

Base model Qwen/Qwen3.5-122B-A10B
Format BF16
Total params 76B
Active / token 10B
Experts / layer —
Layers —
Hidden size —
Context —
On-disk size 152 GB

Which variant should I pick?

Variant Format Link
Qwen3.5-264B BF16 link
Qwen3.5-264B-FP8 FP8 link
Qwen3.5-264B-W4A16 W4A16 link
Qwen3.5-28B BF16 link
Qwen3.5-35B-EXL3-4bpw EXL3-4bpw link
Qwen3.5-76B (this) BF16 link
Qwen3.5-76B-GGUF GGUF link
Qwen3.5-88B BF16 link
Qwen3.5-99B BF16 link
Qwen3.5-99B-GGUF GGUF link

40% expert-pruned variant of Qwen3.5-122B-A10B using REAP (Routing-Enhanced Activation Pruning).

Model Details

Property Value
Base Model Qwen/Qwen3.5-122B-A10B
Architecture Qwen3.5 MoE (GDN + Full Attention)
Original Experts 256 per layer
Pruned Experts 154 per layer (40% removed)
Active Parameters ~10B per token
Pruning Method REAP with targeted refusal preservation
Preserve Threshold 80% (super-expert protection)
Calibration reap-calibration-data-v1 — 23k benchmark-free samples
Maintainer 0xSero
Organization Sybil Solutions
Project REAP PR17

Usage

vllm serve 0xSero/Qwen3.5-76B \
  --tensor-parallel-size 4 \
  --enable-expert-parallel \
  --max-model-len 8192 \
  --trust-remote-code \
  --language-model-only \
  --dtype bfloat16

Important: Use --language-model-only flag — this is a text-only checkpoint pruned from the multimodal base model.

What is REAP?

REAP (Routing-Enhanced Activation Pruning) removes the least-activated experts from MoE models while preserving critical capabilities. It uses router activation patterns from a calibration dataset to identify dispensable experts, with special protection for safety-critical behaviors.

License

Same license as the base model (Qwen).

License & citation

License inherited from the base model.

@misc{lasby2025reap,
  title  = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
  year   = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

Sponsors

Made possible by NVIDIA · TNG Technology · Lambda · Prime Intellect · Hot Aisle.

Downloads last month
23
Safetensors
Model size
76B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xSero/Qwen3.5-76B

Finetuned
(47)
this model

Space using 0xSero/Qwen3.5-76B 1

Collection including 0xSero/Qwen3.5-76B

Paper for 0xSero/Qwen3.5-76B