Qwen2.5-Math-DeepSeekR1-Sens-7B

A 7B merged model created by applying Sensitivity-aware Model Merging (Sens Merging) to:

  • deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • Qwen/Qwen2.5-Math-7B

The goal of this model is to preserve the strong mathematical reasoning ability of DeepSeek-R1-Distill while significantly reducing reasoning verbosity and output token length.


Highlights

  • Average accuracy: 66.9%
  • Average output tokens: 701
  • Output tokens reduced by 75.2% compared to DeepSeek-R1-Distill-Qwen-7B
  • Only 2.5 points lower average accuracy than DeepSeek-R1-Distill-Qwen-7B

This model provides an attractive trade-off between reasoning quality and inference cost.


Base Models

Model Avg Accuracy Avg Tokens
DeepSeek-R1-Distill-Qwen-7B 69.4 2826
Qwen2.5-Math-7B 45.3 755
Sens Merge (λ=0.4) 66.9 701

Benchmark Results

Benchmark Distill Qwen2.5-Math Sens Merge (λ=0.4)
College Math 66.0 37.9 70.4
GSM8K 90.2 84.5 90.6
MATH 94.4 73.3 90.2
Minerva Math 41.5 13.6 36.0
OlympiadBench 55.0 17.3 47.2
Avg Accuracy 69.4 45.3 66.9
Avg Tokens 2826 755 701

Motivation

Large reasoning models such as DeepSeek-R1-Distill often produce long chains of thought, which increases inference cost.

This model explores whether model merging can reduce reasoning verbosity without requiring additional training.

By merging a reasoning model (DeepSeek-R1-Distill-Qwen-7B) with a compact mathematical model (Qwen2.5-Math-7B) using Sensitivity-aware Model Merging, the merged model:

  • Maintains competitive reasoning performance
  • Produces significantly shorter outputs
  • Requires no gradient-based fine-tuning
  • Uses only a small calibration dataset

Comparison with DPO

We additionally compared Sens Merging with a DPO-trained model:

Model Avg Accuracy Avg Tokens
DeepSeek-R1-Distill-Qwen-7B 69.4 2826
DPO 68.55 2402
Sens Merge (λ=0.4) 66.9 701

Sens Merging achieves a much larger reduction in output length while remaining competitive in accuracy.


Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "quangdung/Qwen2.5-Math-DeepSeekR1-Sens-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Solve: If x^2 + 5x + 6 = 0, find x."

messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
60
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for quangdung/Qwen2.5-7B-Math-Distill-Sens