nhyha commited on
Commit
266d819
1 Parent(s): 8e8ac45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -24
README.md CHANGED
@@ -128,30 +128,6 @@ Or you can install vLLM from [source](https://github.com/vllm-project/vllm/).
128
 
129
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
130
 
131
- ## Evaluation
132
-
133
- We briefly compare Qwen2-7B-Instruct with similar-sized instruction-tuned LLMs, including Qwen1.5-7B-Chat. The results are shown below:
134
-
135
- | Datasets | Llama-3-8B-Instruct | Yi-1.5-9B-Chat | GLM-4-9B-Chat | Qwen1.5-7B-Chat | Qwen2-7B-Instruct |
136
- | :--- | :---: | :---: | :---: | :---: | :---: |
137
- | _**English**_ | | | | | |
138
- | MMLU | 68.4 | 69.5 | **72.4** | 59.5 | 70.5 |
139
- | MMLU-Pro | 41.0 | - | - | 29.1 | **44.1** |
140
- | GPQA | **34.2** | - | **-** | 27.8 | 25.3 |
141
- | TheroemQA | 23.0 | - | - | 14.1 | **25.3** |
142
- | MT-Bench | 8.05 | 8.20 | 8.35 | 7.60 | **8.41** |
143
- | _**Coding**_ | | | | | |
144
- | Humaneval | 62.2 | 66.5 | 71.8 | 46.3 | **79.9** |
145
- | MBPP | **67.9** | - | - | 48.9 | 67.2 |
146
- | MultiPL-E | 48.5 | - | - | 27.2 | **59.1** |
147
- | Evalplus | 60.9 | - | - | 44.8 | **70.3** |
148
- | LiveCodeBench | 17.3 | - | - | 6.0 | **26.6** |
149
- | _**Mathematics**_ | | | | | |
150
- | GSM8K | 79.6 | **84.8** | 79.6 | 60.3 | 82.3 |
151
- | MATH | 30.0 | 47.7 | **50.6** | 23.2 | 49.6 |
152
- | _**Chinese**_ | | | | | |
153
- | C-Eval | 45.9 | - | 75.6 | 67.3 | **77.2** |
154
- | AlignBench | 6.20 | 6.90 | 7.01 | 6.20 | **7.21** |
155
 
156
  ## Citation
157
 
 
128
 
129
  **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
130
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
  ## Citation
133