JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

[Paper] • [GitHub] • [Models] • [Data]

Introduction

JiuZhang3.0 is a series of fine-tuned models for math reasoning continually pre-trained on corpus synthesized by our carefully trained small LLM.

Experimental Results

For more evaluation results, please refer to the Paper

Models	GSM8k	MATH	SVAMP	ASDiv	MAWPS	CARP	Avg.
GPT-4	92.2	65.4	92.9	94.3	96.6	53.6	82.5
20B+ Models
Llemma-34B	60.2	24.6	68.0	75.6	89.8	36.5	59.1
Intern-Math-20B	64.9	27.4	74.9	79.6	94.4	42.3	63.9
ChatGLM-Math-32B	82.6	40.6	-	-	-	-	-
MAmmoTH2-8x7B-Plus	86.4	47.0	90.0	92.2	97.0	45.8	76.4
JiuZhang3.0-8x7B	89.8	53.8	90.2	93.1	96.7	52.3	79.3
7-8B Models
Mistral-7B-MMIQC	75.0	34.2	73.5	82.1	90.1	36.5	65.2
MetaMath-Mistral-7B	77.8	29.6	79.6	81.2	93.7	30.5	65.4
Abel-7B-002	80.4	29.6	78.8	82.7	93.5	33.2	66.4
WizardMath-7B-1.1	82.2	32.8	80.7	84.2	93.8	31.9	67.6
Math-Shepherd-Mistral-7B	84.3	34.4	82.9	82.8	92.5	32.9	68.3
KPMath-DSMath-7B	83.9	48.8	81.5	88.9	94.8	-	-
MAmmoTH2-7B-Plus	84.2	46.2	90.3	90.3	97.1	44.3	75.2
MAmmoTH2-8B-Plus	84.4	41.2	89.9	89.9	97.1	44.8	74.6
DeepSeekMath-7B-Instruct	82.3	45.8	83.7	90.1	95.7	45.8	73.9
DeepSeekMath-7B-RL	88.2	50.2	87.3	91.8	95.5	51.6	77.4
JiuZhang3.0-7B	88.6	52.8	90.4	92.6	97.3	51.0	78.8
JiuZhang3.0-8B	88.6	51.0	89.4	92.6	97.1	50.9	78.3

Evaluation

Natural Language Reasoning

## Question
{question}

## Solution
{solution}

Tool Manipulation

## Question
{question}

## Code Solution
{solution}

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2024jiuzhang30,
      title={JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models}, 
      author={Kun Zhou and Beichen Zhang and Jiapeng Wang and Zhipeng Chen and Wayne Xin Zhao and Jing Sha and Zhichao Sheng and Shijin Wang and Ji-Rong Wen},
      year={2024},
}

ToheartZhang
/

JiuZhang3.0-8x7B