File size: 9,488 Bytes
3d31f23
 
 
aff6fbf
 
4259970
aff6fbf
 
 
3fee092
aff6fbf
 
787a551
aff6fbf
 
085d648
 
 
 
9cc08bb
e0f8e13
 
9cc08bb
085d648
aff6fbf
36457c8
 
06dbd3e
 
 
7c3f7bc
3fee092
7c3f7bc
103c968
 
 
 
 
 
 
 
 
 
 
 
6e2768b
aff6fbf
dfaa3f4
aff6fbf
dfaa3f4
aff6fbf
3fee092
 
 
aff6fbf
3fee092
 
 
aff6fbf
 
 
f3d80ce
 
24834ac
 
4a5651a
be3435e
 
4a5651a
 
be3435e
4a5651a
 
 
be3435e
aff6fbf
 
14403d2
 
 
 
aff6fbf
 
 
 
 
 
 
3fee092
 
 
 
 
 
 
 
 
 
 
 
 
 
6e2214e
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: llama2
---


## WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct (RLEIF)


<p align="center">
πŸ€— <a href="https://huggingface.co/WizardLM" target="_blank">HF Repo</a>  β€’πŸ± <a href="https://github.com/nlpxucan/WizardLM" target="_blank">Github Repo</a> β€’ 🐦 <a href="https://twitter.com/WizardLM_AI" target="_blank">Twitter</a> β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a>  β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>   β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>  <br>
</p>
<p align="center">
    πŸ‘‹ Join our <a href="https://discord.gg/VZjjHtWrKs" target="_blank">Discord</a>
</p>

|  Model  |  Checkpoint  | Paper    | HumanEval  |   MBPP | Demo | License |
| ----- |------| ---- |------|-------| ----- |  ----- | 
|  WizardCoder-Python-34B-V1.0  |   πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  73.2   | 61.2 | [Demo](http://47.103.63.15:50085/) |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>  |
|  WizardCoder-15B-V1.0  |   πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  59.8   |50.6 | -- |  <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>  |
|  WizardCoder-Python-13B-V1.0  |   πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  64.0   | 55.6 | -- |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>  |
|  WizardCoder-Python-7B-V1.0  |   πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  55.5   | 51.6 | [Demo](http://47.103.63.15:50088/) |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a>  |
|  WizardCoder-3B-V1.0  |   πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-3B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  34.8   |37.4 | -- |  <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>  |
|  WizardCoder-1B-V1.0  |   πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-1B-V1.0" target="_blank">HF Link</a>   |  πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a>  |  23.8   |28.6 | -- |  <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a>  |


| Model | Checkpoint | Paper  | GSM8k | MATH  |Online Demo| License|
| ----- |------| ---- |------|-------| ----- | ----- |
| WizardMath-70B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a> |  πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **81.6**  |  **22.7**	|[Demo](http://47.103.63.15:50083/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2  </a> |
| WizardMath-13B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardMath-13B-V1.0" target="_blank">HF Link</a> |  πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **63.9**  |  **14.0** |[Demo](http://47.103.63.15:50082/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> |
| WizardMath-7B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardMath-7B-V1.0" target="_blank">HF Link</a>  |  πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| 	 **54.9**  |  **10.7** | [Demo](http://47.103.63.15:50080/)|  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2  </a>|    

   

<font size=4>
    
| <sup>Model</sup> | <sup>Checkpoint</sup> | <sup>Paper</sup> |<sup>MT-Bench</sup> | <sup>AlpacaEval</sup>  | <sup>GSM8k</sup> | <sup>HumanEval</sup>  | <sup>License</sup>|
| ----- |------| ---- |------|-------| ----- | ----- | ----- | 
| <sup>**WizardLM-70B-V1.0**</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-70B-V1.0" target="_blank">HF Link</a> </sup>|<sup>πŸ“ƒ**Coming Soon**</sup>| <sup>**7.78**</sup> | <sup>**92.91%**</sup>	 |<sup>**77.6%**</sup>	 | <sup>   **50.6 pass@1**</sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
| <sup>WizardLM-13B-V1.2</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.2" target="_blank">HF Link</a> </sup>|  | <sup>7.06</sup> | <sup>89.17%</sup>	 |<sup>55.3%</sup>	 | <sup>36.6  pass@1</sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
| <sup>WizardLM-13B-V1.1</sup> |<sup> πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.1" target="_blank">HF Link</a> </sup> |  | <sup>6.76</sup>  |<sup>86.32%</sup>	 | 	 | <sup>25.0  pass@1</sup>| <sup>Non-commercial</sup>|
| <sup>WizardLM-30B-V1.0</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-30B-V1.0" target="_blank">HF Link</a></sup>  | | <sup>7.01</sup> |                    | |  <sup>37.8  pass@1</sup>| <sup>Non-commercial</sup> |
| <sup>WizardLM-13B-V1.0</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.0" target="_blank">HF Link</a> </sup> |  | <sup>6.35</sup> | <sup>75.31%</sup> |  | <sup> 24.0 pass@1 </sup> | <sup>Non-commercial</sup>|
| <sup>WizardLM-7B-V1.0 </sup>|  <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-7B-V1.0" target="_blank">HF Link</a> </sup> |<sup> πŸ“ƒ <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> </sup>|  |  |  |<sup>19.1 pass@1 </sup>|<sup> Non-commercial</sup>|
</font>

**Github Repo**: https://github.com/nlpxucan/WizardLM/tree/main/WizardMath

**Twitter**: https://twitter.com/WizardLM_AI/status/1689998428200112128

**Discord**: https://discord.gg/VZjjHtWrKs

## Comparing WizardMath-V1.0 with Other LLMs.

πŸ”₯ The following figure shows that our **WizardMath-70B-V1.0 attains the fifth position in this benchmark**, surpassing ChatGPT (81.6 vs. 80.8) , Claude Instant (81.6 vs. 80.9), PaLM 2 540B (81.6 vs. 80.7).

<p align="center" width="100%">
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardMath/images/wizardmath_gsm8k.png" alt="WizardMath" style="width: 96%; min-width: 300px; display: block; margin: auto;"></a>
</p>

❗<b>Note for model system prompts usage:</b>

Please use **the same systems prompts strictly** with us, and we do not guarantee the accuracy of the **quantified versions**.

**Default version:**

```
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
```


**CoT Version:** οΌˆβ—For the **simple** math questions, we do NOT recommend to use the CoT prompt.οΌ‰ 


```
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: Let's think step by step."
```

## Inference WizardMath Demo Script

We provide the WizardMath inference demo code [here](https://github.com/nlpxucan/WizardLM/tree/main/demo).


❗<b>To commen concern about dataset:</b>

Recently, there have been clear changes in the open-source policy and regulations of our overall organization's code, data, and models. 
Despite this, we have still worked hard to obtain opening the weights of the model first, but the data involves stricter auditing and is in review with our legal team .
Our researchers have no authority to publicly release them without authorization.
Thank you for your understanding.


## Citation

Please cite the repo if you use the data, method or code in this repo.

```
@article{luo2023wizardmath,
  title={WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct},
  author={Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lou, Jianguang and Tao, Chongyang and Geng, Xiubo and Lin, Qingwei and Chen, Shifeng and Zhang, Dongmei},
  journal={arXiv preprint arXiv:2308.09583},
  year={2023}
}
```

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_WizardLM__WizardMath-7B-V1.0)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 45.44   |
| ARC (25-shot)         | 54.1          |
| HellaSwag (10-shot)   | 79.55    |
| MMLU (5-shot)         | 45.97         |
| TruthfulQA (0-shot)   | 43.65   |
| Winogrande (5-shot)   | 72.69   |
| GSM8K (5-shot)        | 2.73        |
| DROP (3-shot)         | 19.41         |