File size: 2,921 Bytes

---
license: apache-2.0
datasets:
- TIGER-Lab/WebInstruct-CFT
language:
- en
base_model:
- Qwen/Qwen2.5-Math-7B
tags:
- cft
- math
- reasoning
pipeline_tag: text-generation
library_name: transformers
---

# Qwen2.5-Math-7B-CFT

<div style="display: flex; gap: 4px; align-items: center">
  <a target="_blank" href="https://github.com/TIGER-AI-Lab/CritiqueFinetuning">
    <img style="height:18pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"/>
  </a>
  <a target="_blank" href="https://arxiv.org/abs/2501.17703">
    <img style="height:18pt" src="https://img.shields.io/badge/-Paper-green?style=flat&logo=arxiv"/>
  </a>
  <a target="_blank" href="https://tiger-ai-lab.github.io/CritiqueFineTuning">
    <img style="height:18pt" src="https://img.shields.io/badge/-📖%20Website-red?style=flat"/>
  </a>
  <a target="_blank" href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT">
    <img style="height:18pt" src="https://img.shields.io/badge/-🤗%20Dataset-red?style=flat"/>
  </a>
</div>

## Introduction

Qwen2.5-Math-7B-CFT is a 7B parameter mathematical reasoning model that introduces a paradigm shift in language model training. Rather than using traditional supervised fine-tuning (SFT) to imitate correct answers, this model is trained using our novel Critique Fine-Tuning (CFT) approach, which teaches the model to critique and analyze responses, leading to deeper understanding and enhanced reasoning capabilities.

The model demonstrates that learning to critique is more effective than learning to imitate. Despite being trained on just 50K samples, it achieves remarkable performance matching or exceeding models trained on 2M+ samples, reaching 79.4% accuracy on MATH and 41.6% on OlympiadBench benchmarks.


## Key Features

- Novel training methodology inspired by human learning processes that emphasize critical thinking
- Consistent 4-10% improvement over traditional SFT approaches across six math benchmarks 
- Exceptional data efficiency: matches performance of models trained on 40x more data
- Built on the strong foundation of Qwen2.5-Math-7B


## Training Details


### Training Data
- Dataset: [WebInstruct-CFT-50K](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-CFT)
- Training format: (input=[query; noisy response], output=critique)
- Teacher model: GPT-4o for generating critiques


### Training Infrastructure
- Framework: LLaMA-Factory
- Hardware: 8x NVIDIA H100 GPUs
- Training time: ~1 hour with DeepSpeed Zero-3


## Evaluation Results





![image/png](https://cdn-uploads.huggingface.co/production/uploads/636a35eff8d9af4aea181608/tLLFW6OEASFojDyX1Zh-K.png)*Table 1: Performance comparison of Qwen2.5-Math-7B-CFT vs. other reasoning-specialized models.*

For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning).