Update README.md
Browse files
README.md
CHANGED
@@ -129,6 +129,25 @@ All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Har
|
|
129 |
| GLM-4-0520 π | 61.4 | (-2.6, 2.4) |
|
130 |
|
131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
132 |
|
133 |
|
134 |
## References
|
|
|
129 |
| GLM-4-0520 π | 61.4 | (-2.6, 2.4) |
|
130 |
|
131 |
|
132 |
+
### 3.2 AlignBench-v1.1
|
133 |
+
|
134 |
+
> [!IMPORTANT]
|
135 |
+
>
|
136 |
+
> We replaced the original judge model, `GPT-4-0613`, in AlignBench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the AlignBench-v1.1 scores reported elsewhere.
|
137 |
+
|
138 |
+
| | Score |
|
139 |
+
| ----------------------------- | ------------------------ |
|
140 |
+
| **Xwen-72B-Chat** π | **7.57** (Top-1 Among π) |
|
141 |
+
| Qwen2.5-72B-Chat π | 7.51 |
|
142 |
+
| Deepseek V2.5 π | 7.38 |
|
143 |
+
| Mistral-Large-Instruct-2407 π | 7.10 |
|
144 |
+
| Llama3.1-70B-Instruct π | 5.81 |
|
145 |
+
| Llama-3.1-405B-Instruct-FP8 π | 5.56 |
|
146 |
+
| GPT-4o-0513 π | **7.59** (Top-1 Among π) |
|
147 |
+
| Claude-3.5-Sonnet-20240620 π | 7.17 |
|
148 |
+
| Yi-Lightning π | 7.54 |
|
149 |
+
| Yi-Large-Preview π | 7.20 |
|
150 |
+
|
151 |
|
152 |
|
153 |
## References
|