Qwen2.5-Math-DeepSeekR1-Sens-7B

A 7B merged model created by applying Sensitivity-aware Model Merging (Sens Merging) to:

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Qwen/Qwen2.5-Math-7B

The goal of this model is to preserve the strong mathematical reasoning ability of DeepSeek-R1-Distill while significantly reducing reasoning verbosity and output token length.

Highlights

Average accuracy: 66.9%
Average output tokens: 701
Output tokens reduced by 75.2% compared to DeepSeek-R1-Distill-Qwen-7B
Only 2.5 points lower average accuracy than DeepSeek-R1-Distill-Qwen-7B

This model provides an attractive trade-off between reasoning quality and inference cost.

Base Models

Model	Avg Accuracy	Avg Tokens
DeepSeek-R1-Distill-Qwen-7B	69.4	2826
Qwen2.5-Math-7B	45.3	755
Sens Merge (λ=0.4)	66.9	701

Benchmark Results

Benchmark	Distill	Qwen2.5-Math	Sens Merge (λ=0.4)
College Math	66.0	37.9	70.4
GSM8K	90.2	84.5	90.6
MATH	94.4	73.3	90.2
Minerva Math	41.5	13.6	36.0
OlympiadBench	55.0	17.3	47.2
Avg Accuracy	69.4	45.3	66.9
Avg Tokens	2826	755	701

Motivation

Large reasoning models such as DeepSeek-R1-Distill often produce long chains of thought, which increases inference cost.

This model explores whether model merging can reduce reasoning verbosity without requiring additional training.

By merging a reasoning model (DeepSeek-R1-Distill-Qwen-7B) with a compact mathematical model (Qwen2.5-Math-7B) using Sensitivity-aware Model Merging, the merged model:

Maintains competitive reasoning performance
Produces significantly shorter outputs
Requires no gradient-based fine-tuning
Uses only a small calibration dataset

Comparison with DPO

We additionally compared Sens Merging with a DPO-trained model:

Model	Avg Accuracy	Avg Tokens
DeepSeek-R1-Distill-Qwen-7B	69.4	2826
DPO	68.55	2402
Sens Merge (λ=0.4)	66.9	701

Sens Merging achieves a much larger reduction in output length while remaining competitive in accuracy.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "quangdung/Qwen2.5-Math-DeepSeekR1-Sens-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Solve: If x^2 + 5x + 6 = 0, find x."

messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 60

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for quangdung/Qwen2.5-7B-Math-Distill-Sens

Qwen/Qwen2.5-Math-7B

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Merge model

this model