Text Generation
Transformers
Safetensors
English
mixtral
text-generation-inference
Inference Endpoints
Edit model card

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

[Paper] • [GitHub] • [Models] • [Data]

Introduction

JiuZhang3.0 is a series of fine-tuned models for math reasoning continually pre-trained on corpus synthesized by our carefully trained small LLM.

Experimental Results

For more evaluation results, please refer to the Paper

Models GSM8k MATH SVAMP ASDiv MAWPS CARP Avg.
GPT-4 92.2 65.4 92.9 94.3 96.6 53.6 82.5
20B+ Models
Llemma-34B 60.2 24.6 68.0 75.6 89.8 36.5 59.1
Intern-Math-20B 64.9 27.4 74.9 79.6 94.4 42.3 63.9
ChatGLM-Math-32B 82.6 40.6 - - - - -
MAmmoTH2-8x7B-Plus 86.4 47.0 90.0 92.2 97.0 45.8 76.4
JiuZhang3.0-8x7B 89.8 53.8 90.2 93.1 96.7 52.3 79.3
7-8B Models
Mistral-7B-MMIQC 75.0 34.2 73.5 82.1 90.1 36.5 65.2
MetaMath-Mistral-7B 77.8 29.6 79.6 81.2 93.7 30.5 65.4
Abel-7B-002 80.4 29.6 78.8 82.7 93.5 33.2 66.4
WizardMath-7B-1.1 82.2 32.8 80.7 84.2 93.8 31.9 67.6
Math-Shepherd-Mistral-7B 84.3 34.4 82.9 82.8 92.5 32.9 68.3
KPMath-DSMath-7B 83.9 48.8 81.5 88.9 94.8 - -
MAmmoTH2-7B-Plus 84.2 46.2 90.3 90.3 97.1 44.3 75.2
MAmmoTH2-8B-Plus 84.4 41.2 89.9 89.9 97.1 44.8 74.6
DeepSeekMath-7B-Instruct 82.3 45.8 83.7 90.1 95.7 45.8 73.9
DeepSeekMath-7B-RL 88.2 50.2 87.3 91.8 95.5 51.6 77.4
JiuZhang3.0-7B 88.6 52.8 90.4 92.6 97.3 51.0 78.8
JiuZhang3.0-8B 88.6 51.0 89.4 92.6 97.1 50.9 78.3

Evaluation

Natural Language Reasoning

## Question
{question}

## Solution
{solution}

Tool Manipulation

## Question
{question}

## Code Solution
{solution}

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2024jiuzhang30,
      title={JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models}, 
      author={Kun Zhou and Beichen Zhang and Jiapeng Wang and Zhipeng Chen and Wayne Xin Zhao and Jing Sha and Zhichao Sheng and Shijin Wang and Ji-Rong Wen},
      year={2024},
}
Downloads last month
3
Safetensors
Model size
46.7B params
Tensor type
BF16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train ToheartZhang/JiuZhang3.0-8x7B

Collection including ToheartZhang/JiuZhang3.0-8x7B