xwen-team
/

Xwen-72B-Chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

shenzhi-wang commited on Feb 1

Commit

0046a3e

·

verified ·

1 Parent(s): 46b6c6f

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -129,6 +129,25 @@ All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Har
 | GLM-4-0520 🔒                      | 61.4                     | (-2.6, 2.4) |
 ## References

 | GLM-4-0520 🔒                      | 61.4                     | (-2.6, 2.4) |
+### 3.2 AlignBench-v1.1
+> [!IMPORTANT]
+>
+> We replaced the original judge model, `GPT-4-0613`, in AlignBench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the AlignBench-v1.1 scores reported elsewhere.
+|                               | Score                    |
+| ----------------------------- | ------------------------ |
+| **Xwen-72B-Chat** 🔑           | **7.57** (Top-1 Among 🔑) |
+| Qwen2.5-72B-Chat 🔑            | 7.51                     |
+| Deepseek V2.5 🔑               | 7.38                     |
+| Mistral-Large-Instruct-2407 🔑 | 7.10                     |
+| Llama3.1-70B-Instruct 🔑       | 5.81                     |
+| Llama-3.1-405B-Instruct-FP8 🔑 | 5.56                     |
+| GPT-4o-0513 🔒                 | **7.59** (Top-1 Among 🔒) |
+| Claude-3.5-Sonnet-20240620 🔒  | 7.17                     |
+| Yi-Lightning 🔒                | 7.54                     |
+| Yi-Large-Preview 🔒            | 7.20                     |
 ## References