fblgit commited on
Commit
f3afc86
1 Parent(s): bc405bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -213,6 +213,48 @@ hf (pretrained=fblgit/LUNA-SOLARkrautLM-Instruct,dtype=float16), gen_kwargs: (),
213
  | - social_sciences|N/A |none | 5|acc |0.7501|± |0.0684|
214
  | - stem |N/A |none | 5|acc |0.5569|± |0.1360|
215
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
216
 
217
  ## Disclaimer
218
  We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.
 
213
  | - social_sciences|N/A |none | 5|acc |0.7501|± |0.0684|
214
  | - stem |N/A |none | 5|acc |0.5569|± |0.1360|
215
  ```
216
+ ### MT-Bench
217
+ ```
218
+ ########## Average ##########
219
+ score
220
+ model
221
+ gpt-4 8.990625
222
+ gpt-3.5-turbo 7.943750
223
+ claude-instant-v1 7.905660
224
+ claude-v1 7.900000
225
+ UNA-SOLAR-10.7B-Instruct-v1.0 7.521875
226
+ LUNA-SOLARkrautLM-Instruct 7.462500
227
+ vicuna-33b-v1.3 7.121875
228
+ wizardlm-30b 7.009375
229
+ Llama-2-70b-chat 6.856250
230
+ Llama-2-13b-chat 6.650000
231
+ guanaco-33b 6.528125
232
+ tulu-30b 6.434375
233
+ guanaco-65b 6.409375
234
+ oasst-sft-7-llama-30b 6.409375
235
+ palm-2-chat-bison-001 6.400000
236
+ mpt-30b-chat 6.393750
237
+ vicuna-13b-v1.3 6.387500
238
+ wizardlm-13b 6.353125
239
+ Llama-2-7b-chat 6.268750
240
+ vicuna-7b-v1.3 5.996875
241
+ baize-v2-13b 5.750000
242
+ nous-hermes-13b 5.553459
243
+ mpt-7b-chat 5.459119
244
+ gpt4all-13b-snoozy 5.452830
245
+ koala-13b 5.350000
246
+ mpt-30b-instruct 5.218750
247
+ falcon-40b-instruct 5.168750
248
+ h2ogpt-oasst-open-llama-13b 4.625000
249
+ alpaca-13b 4.531250
250
+ chatglm-6b 4.500000
251
+ oasst-sft-4-pythia-12b 4.318750
252
+ rwkv-4-raven-14b 3.984375
253
+ dolly-v2-12b 3.275000
254
+ fastchat-t5-3b 3.040625
255
+ stablelm-tuned-alpha-7b 2.753125
256
+ llama-13b 2.606250
257
+ ```
258
 
259
  ## Disclaimer
260
  We must inform users that despite our best efforts in data cleansing, the possibility of uncensored content slipping through cannot be entirely ruled out.