Model: DeepSeek-R1-Distill-Qwen-7B Sparse 50%
Base model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Method: SparseGPT unstructured pruning at 50% sparsity
Calibration data: GSM8K math problems (128 samples)
Sparsity achieved: 42.95%
Quality: Multi-step reasoning preserved. Model correctly solves word problems with step-by-step explanation.
Hardware used: Kaggle T4 GPU
Purpose: Proof of concept for PE-MoE architecture โ demonstrating that reasoning quality survives 50% weight pruning on a distilled reasoning model.
Limitations:
Sparsity is unstructured โ requires sparse-aware inference for memory benefits
Calibration data was general, not domain-specific
Not quantized โ AWQ step not yet applied
42.95% actual sparsity vs 50% target due to layer-wise variation
Quality tested on simple math only, not comprehensive benchmarks
- Downloads last month
- 75