shenzhi-wang commited on
Commit
0046a3e
Β·
verified Β·
1 Parent(s): 46b6c6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -129,6 +129,25 @@ All results below, except those for `Xwen-72B-Chat`, are sourced from [Arena-Har
129
  | GLM-4-0520 πŸ”’ | 61.4 | (-2.6, 2.4) |
130
 
131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
 
134
  ## References
 
129
  | GLM-4-0520 πŸ”’ | 61.4 | (-2.6, 2.4) |
130
 
131
 
132
+ ### 3.2 AlignBench-v1.1
133
+
134
+ > [!IMPORTANT]
135
+ >
136
+ > We replaced the original judge model, `GPT-4-0613`, in AlignBench with the more powerful model, `GPT-4o-0513`. To keep fairness, all the results below are generated by ``GPT-4o-0513``. As a result, the following results may differ from the AlignBench-v1.1 scores reported elsewhere.
137
+
138
+ | | Score |
139
+ | ----------------------------- | ------------------------ |
140
+ | **Xwen-72B-Chat** πŸ”‘ | **7.57** (Top-1 Among πŸ”‘) |
141
+ | Qwen2.5-72B-Chat πŸ”‘ | 7.51 |
142
+ | Deepseek V2.5 πŸ”‘ | 7.38 |
143
+ | Mistral-Large-Instruct-2407 πŸ”‘ | 7.10 |
144
+ | Llama3.1-70B-Instruct πŸ”‘ | 5.81 |
145
+ | Llama-3.1-405B-Instruct-FP8 πŸ”‘ | 5.56 |
146
+ | GPT-4o-0513 πŸ”’ | **7.59** (Top-1 Among πŸ”’) |
147
+ | Claude-3.5-Sonnet-20240620 πŸ”’ | 7.17 |
148
+ | Yi-Lightning πŸ”’ | 7.54 |
149
+ | Yi-Large-Preview πŸ”’ | 7.20 |
150
+
151
 
152
 
153
  ## References