mlp-surgery — broken baseline (Qwen2.5-3B)

The "broken" baseline used as input for the mlp-surgery project. Don't use this model for downstream tasks — it underperforms the base model on both math and general reasoning. It's published only so the experiment is reproducible.

What it is

Qwen2.5-3B-Instruct + LoRA fine-tune on the perplexity-filtered top-30% of OpenHermes 2.5 (from the sister project Perplexity-weighted-selective-finetuning), merged into the base weights.

Eval

lm-eval, GSM8K flexible-extract 5-shot, ARC Challenge acc_norm 0-shot, no chat template, batch_size 8, single seed (2026-05-07).

Model GSM8K ARC Challenge
Base (Qwen2.5-3B-Instruct) 63.15% 48.12%
After SFT (broken) 61.64% 45.22%
Restore top 5 63.00% 45.73%
Restore top 15 63.46% 46.50%
Restore top 30 64.29% 48.55%
Restore specificity top 10 61.64% 45.22%

This model is the "After SFT (broken)" row.

Companion models

Code: https://github.com/Malum0x/mlp-surgery

Downloads last month
17
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Malum0x/mlp-surgery-broken

Base model

Qwen/Qwen2.5-3B
Finetuned
(1276)
this model

Dataset used to train Malum0x/mlp-surgery-broken

Collection including Malum0x/mlp-surgery-broken