Text Generation
Transformers
Safetensors
English
llama
conversational
text-generation-inference
Inference Endpoints
Edit model card

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

[Paper] • [GitHub] • [Models] • [Data]

Introduction

JiuZhang3.0 is a series of fine-tuned models for math reasoning continually pre-trained on corpus synthesized by our carefully trained small LLM.

Experimental Results

For more evaluation results, please refer to the Paper

Models GSM8k MATH SVAMP ASDiv MAWPS CARP Avg.
GPT-4 92.2 65.4 92.9 94.3 96.6 53.6 82.5
20B+ Models
Llemma-34B 60.2 24.6 68.0 75.6 89.8 36.5 59.1
Intern-Math-20B 64.9 27.4 74.9 79.6 94.4 42.3 63.9
ChatGLM-Math-32B 82.6 40.6 - - - - -
MAmmoTH2-8x7B-Plus 86.4 47.0 90.0 92.2 97.0 45.8 76.4
JiuZhang3.0-8x7B 89.8 53.8 90.2 93.1 96.7 52.3 79.3
7-8B Models
Mistral-7B-MMIQC 75.0 34.2 73.5 82.1 90.1 36.5 65.2
MetaMath-Mistral-7B 77.8 29.6 79.6 81.2 93.7 30.5 65.4
Abel-7B-002 80.4 29.6 78.8 82.7 93.5 33.2 66.4
WizardMath-7B-1.1 82.2 32.8 80.7 84.2 93.8 31.9 67.6
Math-Shepherd-Mistral-7B 84.3 34.4 82.9 82.8 92.5 32.9 68.3
KPMath-DSMath-7B 83.9 48.8 81.5 88.9 94.8 - -
MAmmoTH2-7B-Plus 84.2 46.2 90.3 90.3 97.1 44.3 75.2
MAmmoTH2-8B-Plus 84.4 41.2 89.9 89.9 97.1 44.8 74.6
DeepSeekMath-7B-Instruct 82.3 45.8 83.7 90.1 95.7 45.8 73.9
DeepSeekMath-7B-RL 88.2 50.2 87.3 91.8 95.5 51.6 77.4
JiuZhang3.0-7B 88.6 52.8 90.4 92.6 97.3 51.0 78.8
JiuZhang3.0-8B 88.6 51.0 89.4 92.6 97.1 50.9 78.3

Evaluation

Natural Language Reasoning

## Question
{question}

## Solution
{solution}

Tool Manipulation

## Question
{question}

## Code Solution
{solution}

Citation

If you find this repository helpful, please consider citing our paper:

@article{zhou2024jiuzhang30,
      title={JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models}, 
      author={Kun Zhou and Beichen Zhang and Jiapeng Wang and Zhipeng Chen and Wayne Xin Zhao and Jing Sha and Zhichao Sheng and Shijin Wang and Ji-Rong Wen},
      year={2024},
}
Downloads last month
13
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train ToheartZhang/JiuZhang3.0-8B

Collection including ToheartZhang/JiuZhang3.0-8B