Qwen
/

Qwen2-57B-A14B-Instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yangapku commited on Jun 6, 2024

Commit

daacf3b

·

verified ·

1 Parent(s): beddf2b

Update README.md

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -122,6 +122,34 @@ For deployment, we recommend using vLLM. You can enable the long-context capabil
 **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
 ## Citation
 If you find our work helpful, feel free to give us a cite.

 **Note**: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**. We advise adding the `rope_scaling` configuration only when processing long contexts is required.
+## Evaluation
+We briefly compare Qwen2-57B-A14B-Instruct with similar-sized instruction-tuned LLMs, including Qwen1.5-32B-Chat. The results are shown as follows:
+| Datasets | Mixtral-8x7B-Instruct-v0.1 | Yi-1.5-34B-Chat | Qwen1.5-32B-Chat | **Qwen2-57B-A14B-Instruct** |
+| :--- | :---: | :---: | :---: | :---: |
+|Architecture | MoE | Dense | Dense | MoE |
+|#Activated Params | 12B | 34B | 32B | 14B |
+|#Params | 47B | 34B | 32B | 57B   |
+| _**English**_ |  |  |  |  |
+| MMLU | 71.4 | **76.8** | 74.8 | 75.4 |
+| MMLU-Pro | 43.3 | 52.3 | 46.4 | **52.8** |
+| GPQA | - | - | 30.8 | **34.3** |
+| TheroemQA | - | - | 30.9 | **33.1** |
+| MT-Bench | 8.30 | 8.50 | 8.30 | **8.55** |
+| _**Coding**_ |  |  |  |  |
+| HumanEval | 45.1 | 75.2 | 68.3 | **79.9** |
+| MBPP | 59.5 | **74.6** | 67.9 | 70.9 |
+| MultiPL-E | - | - | 50.7 | **66.4** |
+| EvalPlus | 48.5 | - | 63.6 | **71.6** |
+| LiveCodeBench | 12.3 | - | 15.2 | **25.5** |
+| _**Mathematics**_ |  |  |  |  |
+| GSM8K | 65.7 | **90.2** | 83.6 | 79.6 |
+| MATH | 30.7 | **50.1** | 42.4 | 49.1 |
+| _**Chinese**_ |  |  |  |  |
+| C-Eval | - | - | 76.7 | 80.5 |
+| AlignBench | 5.70 | 7.20 | 7.19 | **7.36** |
 ## Citation
 If you find our work helpful, feel free to give us a cite.