yangapku commited on
Commit
daacf3b
1 Parent(s): beddf2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -122,6 +122,34 @@ For deployment, we recommend using vLLM. You can enable the long-context capabil
122
 
123
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ## Citation
126
 
127
  If you find our work helpful, feel free to give us a cite.
 
122
 
123
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
124
 
125
+ ## Evaluation
126
+
127
+ We briefly compare Qwen2-57B-A14B-Instruct with similar-sized instruction-tuned LLMs, including Qwen1.5-32B-Chat. The results are shown as follows:
128
+
129
+ | Datasets | Mixtral-8x7B-Instruct-v0.1 | Yi-1.5-34B-Chat | Qwen1.5-32B-Chat | **Qwen2-57B-A14B-Instruct** |
130
+ | :--- | :---: | :---: | :---: | :---: |
131
+ |Architecture | MoE | Dense | Dense | MoE |
132
+ |#Activated Params | 12B | 34B | 32B | 14B |
133
+ |#Params | 47B | 34B | 32B | 57B |
134
+ | _**English**_ | | | | |
135
+ | MMLU | 71.4 | **76.8** | 74.8 | 75.4 |
136
+ | MMLU-Pro | 43.3 | 52.3 | 46.4 | **52.8** |
137
+ | GPQA | - | - | 30.8 | **34.3** |
138
+ | TheroemQA | - | - | 30.9 | **33.1** |
139
+ | MT-Bench | 8.30 | 8.50 | 8.30 | **8.55** |
140
+ | _**Coding**_ | | | | |
141
+ | HumanEval | 45.1 | 75.2 | 68.3 | **79.9** |
142
+ | MBPP | 59.5 | **74.6** | 67.9 | 70.9 |
143
+ | MultiPL-E | - | - | 50.7 | **66.4** |
144
+ | EvalPlus | 48.5 | - | 63.6 | **71.6** |
145
+ | LiveCodeBench | 12.3 | - | 15.2 | **25.5** |
146
+ | _**Mathematics**_ | | | | |
147
+ | GSM8K | 65.7 | **90.2** | 83.6 | 79.6 |
148
+ | MATH | 30.7 | **50.1** | 42.4 | 49.1 |
149
+ | _**Chinese**_ | | | | |
150
+ | C-Eval | - | - | 76.7 | 80.5 |
151
+ | AlignBench | 5.70 | 7.20 | 7.19 | **7.36** |
152
+
153
  ## Citation
154
 
155
  If you find our work helpful, feel free to give us a cite.