File size: 10,470 Bytes
777f1ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
Quantization made by Richard Erkhov.

[Github](https://github.com/RichardErkhov)

[Discord](https://discord.gg/pvy7H8DZMG)

[Request more models](https://github.com/RichardErkhov/quant_request)


internlm2-math-plus-20b - GGUF
- Model creator: https://huggingface.co/internlm/
- Original model: https://huggingface.co/internlm/internlm2-math-plus-20b/


| Name | Quant method | Size |
| ---- | ---- | ---- |
| [internlm2-math-plus-20b.Q2_K.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q2_K.gguf) | Q2_K | 7.03GB |
| [internlm2-math-plus-20b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.IQ3_XS.gguf) | IQ3_XS | 7.79GB |
| [internlm2-math-plus-20b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.IQ3_S.gguf) | IQ3_S | 8.2GB |
| [internlm2-math-plus-20b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q3_K_S.gguf) | Q3_K_S | 8.16GB |
| [internlm2-math-plus-20b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.IQ3_M.gguf) | IQ3_M | 8.5GB |
| [internlm2-math-plus-20b.Q3_K.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q3_K.gguf) | Q3_K | 9.05GB |
| [internlm2-math-plus-20b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q3_K_M.gguf) | Q3_K_M | 9.05GB |
| [internlm2-math-plus-20b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q3_K_L.gguf) | Q3_K_L | 9.83GB |
| [internlm2-math-plus-20b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.IQ4_XS.gguf) | IQ4_XS | 10.12GB |
| [internlm2-math-plus-20b.Q4_0.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q4_0.gguf) | Q4_0 | 10.55GB |
| [internlm2-math-plus-20b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.IQ4_NL.gguf) | IQ4_NL | 10.65GB |
| [internlm2-math-plus-20b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q4_K_S.gguf) | Q4_K_S | 10.62GB |
| [internlm2-math-plus-20b.Q4_K.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q4_K.gguf) | Q4_K | 11.16GB |
| [internlm2-math-plus-20b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q4_K_M.gguf) | Q4_K_M | 11.16GB |
| [internlm2-math-plus-20b.Q4_1.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q4_1.gguf) | Q4_1 | 11.67GB |
| [internlm2-math-plus-20b.Q5_0.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q5_0.gguf) | Q5_0 | 12.79GB |
| [internlm2-math-plus-20b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q5_K_S.gguf) | Q5_K_S | 12.79GB |
| [internlm2-math-plus-20b.Q5_K.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q5_K.gguf) | Q5_K | 13.11GB |
| [internlm2-math-plus-20b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q5_K_M.gguf) | Q5_K_M | 13.11GB |
| [internlm2-math-plus-20b.Q5_1.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q5_1.gguf) | Q5_1 | 13.91GB |
| [internlm2-math-plus-20b.Q6_K.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q6_K.gguf) | Q6_K | 15.18GB |
| [internlm2-math-plus-20b.Q8_0.gguf](https://huggingface.co/RichardErkhov/internlm_-_internlm2-math-plus-20b-gguf/blob/main/internlm2-math-plus-20b.Q8_0.gguf) | Q8_0 | 19.66GB |




Original model description:
---
pipeline_tag: text-generation
license: other
language:
- en
- zh
tags:
- math
---

# InternLM-Math-Plus

<div align="center">

<img src="https://raw.githubusercontent.com/InternLM/InternLM/main/assets/logo.svg" width="200"/>
  <div> </div>
  <div align="center">
    <b><font size="5">InternLM-Math</font></b>
    <sup>
      <a href="https://internlm.intern-ai.org.cn/">
        <i><font size="4">Plus</font></i>
      </a>
    </sup>
    <div> </div>
  </div>

State-of-the-art bilingual open-sourced Math reasoning LLMs. 
A **solver**, **prover**, **verifier**, **augmentor**.

[💻 Github](https://github.com/InternLM/InternLM-Math) [🤗 Demo](https://huggingface.co/spaces/internlm/internlm2-math-7b)
</div>

# News
- [2024.05.24] We release updated version InternLM2-Math-Plus with 4 sizes and state-of-the-art performances including 1.8B, 7B, 20B, and 8x22B. We improve informal math reasoning performance (chain-of-thought and code-intepreter) and formal math reasoning performance (LEAN 4 translation and LEAN 4 theorem proving) significantly.
- [2024.02.10] We add tech reports and citation reference.
- [2024.01.31] We add MiniF2F results with evaluation codes!
- [2024.01.29] We add checkpoints from ModelScope. Update results about majority voting and Code Intepreter. Tech report is on the way!
- [2024.01.26] We add checkpoints from OpenXLab, which ease Chinese users to download!

# Performance

## Formal Math Reasoning
We evaluate the performance of InternLM2-Math-Plus on formal math reasoning benchmark MiniF2F-test. The evaluation setting is same as Llemma with LEAN 4.
| Models                           | MiniF2F-test |
| -------------------------------- | ------------ |
| ReProver                         | 26.5         |
| LLMStep                          | 27.9         |
| GPT-F                            | 36.6         |
| HTPS                             | 41.0         |
| Llemma-7B                        | 26.2         |
| Llemma-34B                       | 25.8         |
| InternLM2-Math-7B-Base           | 30.3         |
| InternLM2-Math-20B-Base          | 29.5         |
| InternLM2-Math-Plus-1.8B         | 38.9         |
| InternLM2-Math-Plus-7B           | **43.4**     |
| InternLM2-Math-Plus-20B          | 42.6         |
| InternLM2-Math-Plus-Mixtral8x22B | 37.3         |

## Informal Math Reasoning
We evaluate the performance of InternLM2-Math-Plus on informal math reasoning benchmark MATH and GSM8K. InternLM2-Math-Plus-1.8B outperforms MiniCPM-2B in the smallest size setting. InternLM2-Math-Plus-7B outperforms Deepseek-Math-7B-RL which is the state-of-the-art math reasoning open source model. InternLM2-Math-Plus-Mixtral8x22B achieves 68.5 on MATH (with Python) and 91.8 on GSM8K.
| Model                            | MATH     | MATH-Python | GSM8K    |
| -------------------------------- | -------- | ----------- | -------- |
| MiniCPM-2B                       | 10.2     | -           | 53.8     |
| InternLM2-Math-Plus-1.8B         | **37.0** | **41.5**    | **58.8** |
| InternLM2-Math-7B                | 34.6     | 50.9        | 78.1     |
| Deepseek-Math-7B-RL              | 51.7     | 58.8        | **88.2** |
| InternLM2-Math-Plus-7B           | **53.0** | **59.7**    | 85.8     |
| InternLM2-Math-20B               | 37.7     | 54.3        | 82.6     |
| InternLM2-Math-Plus-20B          | **53.8** | **61.8**    | **87.7** |
| Mixtral8x22B-Instruct-v0.1       | 41.8     | -           | 78.6     |
| Eurux-8x22B-NCA                  | 49.0     | -           | -        |
| InternLM2-Math-Plus-Mixtral8x22B | **58.1** | **68.5**    | **91.8** |

We also evaluate models on [MathBench-A](https://github.com/open-compass/MathBench). InternLM2-Math-Plus-Mixtral8x22B has comparable performance compared to Claude 3 Opus.
| Model                            | Arithmetic | Primary | Middle | High | College | Average |
| -------------------------------- | ---------- | ------- | ------ | ---- | ------- | ------- |
| GPT-4o-0513                      | 77.7       | 87.7    | 76.3   | 59.0 | 54.0    | 70.9    |
| Claude 3 Opus                    | 85.7       | 85.0    | 58.0   | 42.7 | 43.7    | 63.0    |
| Qwen-Max-0428                    | 72.3       | 86.3    | 65.0   | 45.0 | 27.3    | 59.2    |
| Qwen-1.5-110B                    | 70.3       | 82.3    | 64.0   | 47.3 | 28.0    | 58.4    |
| Deepseek-V2                      | 82.7       | 89.3    | 59.0   | 39.3 | 29.3    | 59.9    |
| Llama-3-70B-Instruct             | 70.3       | 86.0    | 53.0   | 38.7 | 34.7    | 56.5    |
| InternLM2-Math-Plus-Mixtral8x22B | 77.5       | 82.0    | 63.6   | 50.3 | 36.8    | 62.0    |
| InternLM2-Math-20B               | 58.7       | 70.0    | 43.7   | 24.7 | 12.7    | 42.0    |
| InternLM2-Math-Plus-20B          | 65.8       | 79.7    | 59.5   | 47.6 | 24.8    | 55.5    |
| Llama3-8B-Instruct               | 54.7       | 71.0    | 25.0   | 19.0 | 14.0    | 36.7    |
| InternLM2-Math-7B                | 53.7       | 67.0    | 41.3   | 18.3 | 8.0     | 37.7    |
| Deepseek-Math-7B-RL              | 68.0       | 83.3    | 44.3   | 33.0 | 23.0    | 50.3    |
| InternLM2-Math-Plus-7B           | 61.4       | 78.3    | 52.5   | 40.5 | 21.7    | 50.9    |
| MiniCPM-2B                       | 49.3       | 51.7    | 18.0   | 8.7  | 3.7     | 26.3    |
| InternLM2-Math-Plus-1.8B         | 43.0       | 43.3    | 25.4   | 18.9 | 4.7     | 27.1    |

# Citation and Tech Report
```
@misc{ying2024internlmmath,
      title={InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning}, 
      author={Huaiyuan Ying and Shuo Zhang and Linyang Li and Zhejian Zhou and Yunfan Shao and Zhaoye Fei and Yichuan Ma and Jiawei Hong and Kuikun Liu and Ziyi Wang and Yudong Wang and Zijian Wu and Shuaibin Li and Fengzhe Zhou and Hongwei Liu and Songyang Zhang and Wenwei Zhang and Hang Yan and Xipeng Qiu and Jiayu Wang and Kai Chen and Dahua Lin},
      year={2024},
      eprint={2402.06332},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```