Edit model card

see our paper in https://arxiv.org/abs/2401.02415

View the project page: https://github.com/TencentARC/LLaMA-Pro

Model Details

MetaMath-Mistral-Pro is fully fine-tuned on the MetaMathQA datasets and based on the powerful Mistral-Pro model.

Model Usage

The model is trained to use the following format (note the newlines):

<|user|>
Your message here!
<|assistant|>

For best results, format all inputs in this manner. Make sure to include a newline after <|assistant|>, this can affect generation quality quite a bit.

Experiments

Model GSM8k Pass@1 MATH Pass@1
MPT-7B 6.8 3.0
Falcon-7B 6.8 2.3
LLaMA-1-7B 11.0 2.9
LLaMA-2-7B 14.6 2.5
MPT-30B 15.2 3.1
LLaMA-1-13B 17.8 3.9
GPT-Neo-2.7B 19.5 --
Falcon-40B 19.6 2.5
Baichuan-chat-13B 23.9 --
Vicuna-v1.3-13B 27.6 --
LLaMA-2-13B 28.7 3.9
InternLM-7B 31.2 --
ChatGLM-2-6B 32.4 --
GPT-J-6B 34.9 --
LLaMA-1-33B 35.6 3.9
LLaMA-2-34B 42.2 6.24
RFT-7B 50.3 --
LLaMA-1-65B 50.9 10.6
Qwen-7B 51.6 --
WizardMath-7B 54.9 10.7
LLaMA-2-70B 56.8 13.5
WizardMath-13B 63.9 14.0
MAmmoTH-7B (COT) 50.5 10.4
MAmmoTH-7B (POT+COT) 53.6 31.5
Arithmo-Mistral-7B 74.7 25.3
MetaMath-7B 66.5 19.8
MetaMath-13B 72.3 22.4
MetaMath-Mistral-7B 77.7 28.2
MetaMath-Llemma-7B 69.2 30.0
🔥 MetaMath-Mistral-Pro 78.4 30.3

Citation

@article{wu2024llama,
  title={Llama pro: Progressive llama with block expansion},
  author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},
  journal={arXiv preprint arXiv:2401.02415},
  year={2024}
}
Downloads last month
75
Safetensors
Model size
8.99B params
Tensor type
BF16
·

Dataset used to train TencentARC/MetaMath-Mistral-Pro

Space using TencentARC/MetaMath-Mistral-Pro 1

Collection including TencentARC/MetaMath-Mistral-Pro