JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

[Paper][GitHub][Models][Data]

Introduction

JiuZhang3.0 is a series of fine-tuned models for math reasoning continually pre-trained on corpus synthesized by our carefully trained small LLM.

Experimental Results

For more evaluation results, please refer to the Paper

Models GSM8k MATH SVAMP ASDiv MAWPS CARP Avg.
GPT-4 92.2 65.4 92.9 94.3 96.6 53.6 82.5
20B+ Models
Llemma-34B 60.2 24.6 68.0 75.6 89.8 36.5 59.1
Intern-Math-20B 64.9 27.4 74.9 79.6 94.4 42.3 63.9
ChatGLM-Math-32B 82.6 40.6 - - - - -
MAmmoTH2-8x7B-Plus 86.4 47.0 90.0 92.2 97.0 45.8 76.4
JiuZhang3.0-8x7B 89.8 53.8 90.2 93.1 96.7 52.3 79.3
7-8B Models
Mistral-7B-MMIQC 75.0 34.2 73.5 82.1 90.1 36.5 65.2
MetaMath-Mistral-7B 77.8 29.6 79.6 81.2 93.7 30.5 65.4
Abel-7B-002 80.4 29.6 78.8 82.7 93.5 33.2 66.4
WizardMath-7B-1.1 82.2 32.8 80.7 84.2 93.8 31.9 67.6
Math-Shepherd-Mistral-7B 84.3 34.4 82.9 82.8 92.5 32.9 68.3
KPMath-DSMath-7B 83.9 48.8 81.5 88.9 94.8 - -
MAmmoTH2-7B-Plus 84.2 46.2 90.3 90.3 97.1 44.3 75.2
MAmmoTH2-8B-Plus 84.4 41.2 89.9 89.9 97.1 44.8 74.6
DeepSeekMath-7B-Instruct 82.3 45.8 83.7 90.1 95.7 45.8 73.9
DeepSeekMath-7B-RL 88.2 50.2 87.3 91.8 95.5 51.6 77.4
JiuZhang3.0-7B 88.6 52.8 90.4 92.6 97.3 51.0 78.8
JiuZhang3.0-8B 88.6 51.0 89.4 92.6 97.1 50.9 78.3

Data Format

This is a fine-tuned model for synthesizing math-related question-solution pairs. Please use the given prompts.

Syntheized Data Type

  • Grade school
  • Middel school
  • High school
  • College
  • AMC 8
  • AMC 10
  • AMC 12
  • AIME

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2024jiuzhang30,
      title={JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models}, 
      author={Kun Zhou and Beichen Zhang and Jiapeng Wang and Zhipeng Chen and Wayne Xin Zhao and Jing Sha and Zhichao Sheng and Shijin Wang and Ji-Rong Wen},
      year={2024},
}
Downloads last month
10
Safetensors
Model size
6.91B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including ToheartZhang/JiuZhang3.0-Synthesis-7B