Text Generation
Transformers
Safetensors
English
mixtral
Inference Endpoints
text-generation-inference
Edit model card

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

[Paper] • [GitHub] • [Models] • [Data]

Introduction

JiuZhang3.0 is a series of fine-tuned models for math reasoning continually pre-trained on corpus synthesized by our carefully trained small LLM.

Experimental Results

For more evaluation results, please refer to the Paper

Models GSM8k MATH SVAMP ASDiv MAWPS CARP Avg.
GPT-4 92.2 65.4 92.9 94.3 96.6 53.6 82.5
20B+ Models
Llemma-34B 60.2 24.6 68.0 75.6 89.8 36.5 59.1
Intern-Math-20B 64.9 27.4 74.9 79.6 94.4 42.3 63.9
ChatGLM-Math-32B 82.6 40.6 - - - - -
MAmmoTH2-8x7B-Plus 86.4 47.0 90.0 92.2 97.0 45.8 76.4
JiuZhang3.0-8x7B 89.8 53.8 90.2 93.1 96.7 52.3 79.3
7-8B Models
Mistral-7B-MMIQC 75.0 34.2 73.5 82.1 90.1 36.5 65.2
MetaMath-Mistral-7B 77.8 29.6 79.6 81.2 93.7 30.5 65.4
Abel-7B-002 80.4 29.6 78.8 82.7 93.5 33.2 66.4
WizardMath-7B-1.1 82.2 32.8 80.7 84.2 93.8 31.9 67.6
Math-Shepherd-Mistral-7B 84.3 34.4 82.9 82.8 92.5 32.9 68.3
KPMath-DSMath-7B 83.9 48.8 81.5 88.9 94.8 - -
MAmmoTH2-7B-Plus 84.2 46.2 90.3 90.3 97.1 44.3 75.2
MAmmoTH2-8B-Plus 84.4 41.2 89.9 89.9 97.1 44.8 74.6
DeepSeekMath-7B-Instruct 82.3 45.8 83.7 90.1 95.7 45.8 73.9
DeepSeekMath-7B-RL 88.2 50.2 87.3 91.8 95.5 51.6 77.4
JiuZhang3.0-7B 88.6 52.8 90.4 92.6 97.3 51.0 78.8
JiuZhang3.0-8B 88.6 51.0 89.4 92.6 97.1 50.9 78.3

Evaluation

Natural Language Reasoning

## Question
{question}

## Solution
{solution}

Tool Manipulation

## Question
{question}

## Code Solution
{solution}

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2024jiuzhang30,
      title={JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models}, 
      author={Kun Zhou and Beichen Zhang and Jiapeng Wang and Zhipeng Chen and Wayne Xin Zhao and Jing Sha and Zhichao Sheng and Shijin Wang and Ji-Rong Wen},
      year={2024},
}
Downloads last month
2
Safetensors
Model size
46.7B params
Tensor type
BF16
·

Datasets used to train ToheartZhang/JiuZhang3.0-8x7B

Collection including ToheartZhang/JiuZhang3.0-8x7B