HenryJJ commited on
Commit
7a6cb79
1 Parent(s): 8659a70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +407 -0
README.md CHANGED
@@ -38,4 +38,411 @@ Fully opensourced at: https://github.com/hengjiUSTC/learn-llm/blob/main/trl_fine
38
 
39
  ```
40
  python3 trl_finetune.py --config configs/yi_6b.yml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ```
 
38
 
39
  ```
40
  python3 trl_finetune.py --config configs/yi_6b.yml
41
+ ```
42
+
43
+ # Dataset Card for Evaluation run of HenryJJ/Instruct_Yi-6B_Dolly15K
44
+
45
+ <!-- Provide a quick summary of the dataset. -->
46
+
47
+ Dataset automatically created during the evaluation run of model [HenryJJ/Instruct_Yi-6B_Dolly15K](https://huggingface.co/HenryJJ/Instruct_Yi-6B_Dolly15K) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
48
+
49
+ The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task.
50
+
51
+ The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.
52
+
53
+ An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).
54
+
55
+ To load the details from a run, you can for instance do the following:
56
+ ```python
57
+ from datasets import load_dataset
58
+ data = load_dataset("open-llm-leaderboard/details_HenryJJ__Instruct_Yi-6B_Dolly15K",
59
+ "harness_winogrande_5",
60
+ split="train")
61
+ ```
62
+
63
+ ## Latest results
64
+
65
+ These are the [latest results from run 2024-01-06T09:45:44.755529](https://huggingface.co/datasets/open-llm-leaderboard/details_HenryJJ__Instruct_Yi-6B_Dolly15K/blob/main/results_2024-01-06T09-45-44.755529.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval):
66
+
67
+ ```python
68
+ {
69
+ "all": {
70
+ "acc": 0.6267070831158695,
71
+ "acc_stderr": 0.03222713761046951,
72
+ "acc_norm": 0.6343965374667763,
73
+ "acc_norm_stderr": 0.032887983229700546,
74
+ "mc1": 0.28886168910648713,
75
+ "mc1_stderr": 0.01586634640138431,
76
+ "mc2": 0.42839602626744816,
77
+ "mc2_stderr": 0.014270024501714959
78
+ },
79
+ "harness|arc:challenge|25": {
80
+ "acc": 0.5,
81
+ "acc_stderr": 0.014611390804670088,
82
+ "acc_norm": 0.5486348122866894,
83
+ "acc_norm_stderr": 0.014542104569955265
84
+ },
85
+ "harness|hellaswag|10": {
86
+ "acc": 0.5654252141007767,
87
+ "acc_stderr": 0.004946879874422681,
88
+ "acc_norm": 0.7587134037044413,
89
+ "acc_norm_stderr": 0.00426989301158892
90
+ },
91
+ "harness|hendrycksTest-abstract_algebra|5": {
92
+ "acc": 0.35,
93
+ "acc_stderr": 0.0479372485441102,
94
+ "acc_norm": 0.35,
95
+ "acc_norm_stderr": 0.0479372485441102
96
+ },
97
+ "harness|hendrycksTest-anatomy|5": {
98
+ "acc": 0.562962962962963,
99
+ "acc_stderr": 0.04284958639753401,
100
+ "acc_norm": 0.562962962962963,
101
+ "acc_norm_stderr": 0.04284958639753401
102
+ },
103
+ "harness|hendrycksTest-astronomy|5": {
104
+ "acc": 0.6776315789473685,
105
+ "acc_stderr": 0.03803510248351585,
106
+ "acc_norm": 0.6776315789473685,
107
+ "acc_norm_stderr": 0.03803510248351585
108
+ },
109
+ "harness|hendrycksTest-business_ethics|5": {
110
+ "acc": 0.7,
111
+ "acc_stderr": 0.046056618647183814,
112
+ "acc_norm": 0.7,
113
+ "acc_norm_stderr": 0.046056618647183814
114
+ },
115
+ "harness|hendrycksTest-clinical_knowledge|5": {
116
+ "acc": 0.690566037735849,
117
+ "acc_stderr": 0.028450154794118637,
118
+ "acc_norm": 0.690566037735849,
119
+ "acc_norm_stderr": 0.028450154794118637
120
+ },
121
+ "harness|hendrycksTest-college_biology|5": {
122
+ "acc": 0.6666666666666666,
123
+ "acc_stderr": 0.039420826399272135,
124
+ "acc_norm": 0.6666666666666666,
125
+ "acc_norm_stderr": 0.039420826399272135
126
+ },
127
+ "harness|hendrycksTest-college_chemistry|5": {
128
+ "acc": 0.41,
129
+ "acc_stderr": 0.049431107042371025,
130
+ "acc_norm": 0.41,
131
+ "acc_norm_stderr": 0.049431107042371025
132
+ },
133
+ "harness|hendrycksTest-college_computer_science|5": {
134
+ "acc": 0.44,
135
+ "acc_stderr": 0.04988876515698589,
136
+ "acc_norm": 0.44,
137
+ "acc_norm_stderr": 0.04988876515698589
138
+ },
139
+ "harness|hendrycksTest-college_mathematics|5": {
140
+ "acc": 0.36,
141
+ "acc_stderr": 0.04824181513244218,
142
+ "acc_norm": 0.36,
143
+ "acc_norm_stderr": 0.04824181513244218
144
+ },
145
+ "harness|hendrycksTest-college_medicine|5": {
146
+ "acc": 0.6069364161849711,
147
+ "acc_stderr": 0.03724249595817731,
148
+ "acc_norm": 0.6069364161849711,
149
+ "acc_norm_stderr": 0.03724249595817731
150
+ },
151
+ "harness|hendrycksTest-college_physics|5": {
152
+ "acc": 0.3235294117647059,
153
+ "acc_stderr": 0.04655010411319617,
154
+ "acc_norm": 0.3235294117647059,
155
+ "acc_norm_stderr": 0.04655010411319617
156
+ },
157
+ "harness|hendrycksTest-computer_security|5": {
158
+ "acc": 0.77,
159
+ "acc_stderr": 0.04229525846816507,
160
+ "acc_norm": 0.77,
161
+ "acc_norm_stderr": 0.04229525846816507
162
+ },
163
+ "harness|hendrycksTest-conceptual_physics|5": {
164
+ "acc": 0.6212765957446809,
165
+ "acc_stderr": 0.03170995606040655,
166
+ "acc_norm": 0.6212765957446809,
167
+ "acc_norm_stderr": 0.03170995606040655
168
+ },
169
+ "harness|hendrycksTest-econometrics|5": {
170
+ "acc": 0.35964912280701755,
171
+ "acc_stderr": 0.045144961328736334,
172
+ "acc_norm": 0.35964912280701755,
173
+ "acc_norm_stderr": 0.045144961328736334
174
+ },
175
+ "harness|hendrycksTest-electrical_engineering|5": {
176
+ "acc": 0.6482758620689655,
177
+ "acc_stderr": 0.0397923663749741,
178
+ "acc_norm": 0.6482758620689655,
179
+ "acc_norm_stderr": 0.0397923663749741
180
+ },
181
+ "harness|hendrycksTest-elementary_mathematics|5": {
182
+ "acc": 0.4470899470899471,
183
+ "acc_stderr": 0.02560672399577703,
184
+ "acc_norm": 0.4470899470899471,
185
+ "acc_norm_stderr": 0.02560672399577703
186
+ },
187
+ "harness|hendrycksTest-formal_logic|5": {
188
+ "acc": 0.38095238095238093,
189
+ "acc_stderr": 0.04343525428949098,
190
+ "acc_norm": 0.38095238095238093,
191
+ "acc_norm_stderr": 0.04343525428949098
192
+ },
193
+ "harness|hendrycksTest-global_facts|5": {
194
+ "acc": 0.4,
195
+ "acc_stderr": 0.04923659639173309,
196
+ "acc_norm": 0.4,
197
+ "acc_norm_stderr": 0.04923659639173309
198
+ },
199
+ "harness|hendrycksTest-high_school_biology|5": {
200
+ "acc": 0.7774193548387097,
201
+ "acc_stderr": 0.023664216671642525,
202
+ "acc_norm": 0.7774193548387097,
203
+ "acc_norm_stderr": 0.023664216671642525
204
+ },
205
+ "harness|hendrycksTest-high_school_chemistry|5": {
206
+ "acc": 0.4975369458128079,
207
+ "acc_stderr": 0.03517945038691063,
208
+ "acc_norm": 0.4975369458128079,
209
+ "acc_norm_stderr": 0.03517945038691063
210
+ },
211
+ "harness|hendrycksTest-high_school_computer_science|5": {
212
+ "acc": 0.64,
213
+ "acc_stderr": 0.04824181513244218,
214
+ "acc_norm": 0.64,
215
+ "acc_norm_stderr": 0.04824181513244218
216
+ },
217
+ "harness|hendrycksTest-high_school_european_history|5": {
218
+ "acc": 0.7393939393939394,
219
+ "acc_stderr": 0.034277431758165236,
220
+ "acc_norm": 0.7393939393939394,
221
+ "acc_norm_stderr": 0.034277431758165236
222
+ },
223
+ "harness|hendrycksTest-high_school_geography|5": {
224
+ "acc": 0.8181818181818182,
225
+ "acc_stderr": 0.0274796030105388,
226
+ "acc_norm": 0.8181818181818182,
227
+ "acc_norm_stderr": 0.0274796030105388
228
+ },
229
+ "harness|hendrycksTest-high_school_government_and_politics|5": {
230
+ "acc": 0.9015544041450777,
231
+ "acc_stderr": 0.021500249576033456,
232
+ "acc_norm": 0.9015544041450777,
233
+ "acc_norm_stderr": 0.021500249576033456
234
+ },
235
+ "harness|hendrycksTest-high_school_macroeconomics|5": {
236
+ "acc": 0.617948717948718,
237
+ "acc_stderr": 0.02463554916390823,
238
+ "acc_norm": 0.617948717948718,
239
+ "acc_norm_stderr": 0.02463554916390823
240
+ },
241
+ "harness|hendrycksTest-high_school_mathematics|5": {
242
+ "acc": 0.31851851851851853,
243
+ "acc_stderr": 0.028406533090608463,
244
+ "acc_norm": 0.31851851851851853,
245
+ "acc_norm_stderr": 0.028406533090608463
246
+ },
247
+ "harness|hendrycksTest-high_school_microeconomics|5": {
248
+ "acc": 0.7647058823529411,
249
+ "acc_stderr": 0.027553614467863797,
250
+ "acc_norm": 0.7647058823529411,
251
+ "acc_norm_stderr": 0.027553614467863797
252
+ },
253
+ "harness|hendrycksTest-high_school_physics|5": {
254
+ "acc": 0.36423841059602646,
255
+ "acc_stderr": 0.03929111781242742,
256
+ "acc_norm": 0.36423841059602646,
257
+ "acc_norm_stderr": 0.03929111781242742
258
+ },
259
+ "harness|hendrycksTest-high_school_psychology|5": {
260
+ "acc": 0.8348623853211009,
261
+ "acc_stderr": 0.01591955782997604,
262
+ "acc_norm": 0.8348623853211009,
263
+ "acc_norm_stderr": 0.01591955782997604
264
+ },
265
+ "harness|hendrycksTest-high_school_statistics|5": {
266
+ "acc": 0.5694444444444444,
267
+ "acc_stderr": 0.03376922151252335,
268
+ "acc_norm": 0.5694444444444444,
269
+ "acc_norm_stderr": 0.03376922151252335
270
+ },
271
+ "harness|hendrycksTest-high_school_us_history|5": {
272
+ "acc": 0.8088235294117647,
273
+ "acc_stderr": 0.027599174300640766,
274
+ "acc_norm": 0.8088235294117647,
275
+ "acc_norm_stderr": 0.027599174300640766
276
+ },
277
+ "harness|hendrycksTest-high_school_world_history|5": {
278
+ "acc": 0.7932489451476793,
279
+ "acc_stderr": 0.026361651668389094,
280
+ "acc_norm": 0.7932489451476793,
281
+ "acc_norm_stderr": 0.026361651668389094
282
+ },
283
+ "harness|hendrycksTest-human_aging|5": {
284
+ "acc": 0.695067264573991,
285
+ "acc_stderr": 0.030898610882477515,
286
+ "acc_norm": 0.695067264573991,
287
+ "acc_norm_stderr": 0.030898610882477515
288
+ },
289
+ "harness|hendrycksTest-human_sexuality|5": {
290
+ "acc": 0.7480916030534351,
291
+ "acc_stderr": 0.03807387116306085,
292
+ "acc_norm": 0.7480916030534351,
293
+ "acc_norm_stderr": 0.03807387116306085
294
+ },
295
+ "harness|hendrycksTest-international_law|5": {
296
+ "acc": 0.7768595041322314,
297
+ "acc_stderr": 0.03800754475228733,
298
+ "acc_norm": 0.7768595041322314,
299
+ "acc_norm_stderr": 0.03800754475228733
300
+ },
301
+ "harness|hendrycksTest-jurisprudence|5": {
302
+ "acc": 0.7777777777777778,
303
+ "acc_stderr": 0.040191074725573483,
304
+ "acc_norm": 0.7777777777777778,
305
+ "acc_norm_stderr": 0.040191074725573483
306
+ },
307
+ "harness|hendrycksTest-logical_fallacies|5": {
308
+ "acc": 0.7852760736196319,
309
+ "acc_stderr": 0.03226219377286775,
310
+ "acc_norm": 0.7852760736196319,
311
+ "acc_norm_stderr": 0.03226219377286775
312
+ },
313
+ "harness|hendrycksTest-machine_learning|5": {
314
+ "acc": 0.4375,
315
+ "acc_stderr": 0.04708567521880525,
316
+ "acc_norm": 0.4375,
317
+ "acc_norm_stderr": 0.04708567521880525
318
+ },
319
+ "harness|hendrycksTest-management|5": {
320
+ "acc": 0.8155339805825242,
321
+ "acc_stderr": 0.03840423627288276,
322
+ "acc_norm": 0.8155339805825242,
323
+ "acc_norm_stderr": 0.03840423627288276
324
+ },
325
+ "harness|hendrycksTest-marketing|5": {
326
+ "acc": 0.8974358974358975,
327
+ "acc_stderr": 0.01987565502786744,
328
+ "acc_norm": 0.8974358974358975,
329
+ "acc_norm_stderr": 0.01987565502786744
330
+ },
331
+ "harness|hendrycksTest-medical_genetics|5": {
332
+ "acc": 0.76,
333
+ "acc_stderr": 0.042923469599092816,
334
+ "acc_norm": 0.76,
335
+ "acc_norm_stderr": 0.042923469599092816
336
+ },
337
+ "harness|hendrycksTest-miscellaneous|5": {
338
+ "acc": 0.8007662835249042,
339
+ "acc_stderr": 0.014283378044296417,
340
+ "acc_norm": 0.8007662835249042,
341
+ "acc_norm_stderr": 0.014283378044296417
342
+ },
343
+ "harness|hendrycksTest-moral_disputes|5": {
344
+ "acc": 0.708092485549133,
345
+ "acc_stderr": 0.024476994076247333,
346
+ "acc_norm": 0.708092485549133,
347
+ "acc_norm_stderr": 0.024476994076247333
348
+ },
349
+ "harness|hendrycksTest-moral_scenarios|5": {
350
+ "acc": 0.33519553072625696,
351
+ "acc_stderr": 0.015788007190185884,
352
+ "acc_norm": 0.33519553072625696,
353
+ "acc_norm_stderr": 0.015788007190185884
354
+ },
355
+ "harness|hendrycksTest-nutrition|5": {
356
+ "acc": 0.7222222222222222,
357
+ "acc_stderr": 0.025646863097137897,
358
+ "acc_norm": 0.7222222222222222,
359
+ "acc_norm_stderr": 0.025646863097137897
360
+ },
361
+ "harness|hendrycksTest-philosophy|5": {
362
+ "acc": 0.6913183279742765,
363
+ "acc_stderr": 0.026236965881153262,
364
+ "acc_norm": 0.6913183279742765,
365
+ "acc_norm_stderr": 0.026236965881153262
366
+ },
367
+ "harness|hendrycksTest-prehistory|5": {
368
+ "acc": 0.7191358024691358,
369
+ "acc_stderr": 0.025006469755799208,
370
+ "acc_norm": 0.7191358024691358,
371
+ "acc_norm_stderr": 0.025006469755799208
372
+ },
373
+ "harness|hendrycksTest-professional_accounting|5": {
374
+ "acc": 0.48226950354609927,
375
+ "acc_stderr": 0.02980873964223777,
376
+ "acc_norm": 0.48226950354609927,
377
+ "acc_norm_stderr": 0.02980873964223777
378
+ },
379
+ "harness|hendrycksTest-professional_law|5": {
380
+ "acc": 0.4876140808344198,
381
+ "acc_stderr": 0.012766317315473565,
382
+ "acc_norm": 0.4876140808344198,
383
+ "acc_norm_stderr": 0.012766317315473565
384
+ },
385
+ "harness|hendrycksTest-professional_medicine|5": {
386
+ "acc": 0.6213235294117647,
387
+ "acc_stderr": 0.02946513363977613,
388
+ "acc_norm": 0.6213235294117647,
389
+ "acc_norm_stderr": 0.02946513363977613
390
+ },
391
+ "harness|hendrycksTest-professional_psychology|5": {
392
+ "acc": 0.6568627450980392,
393
+ "acc_stderr": 0.019206606848825365,
394
+ "acc_norm": 0.6568627450980392,
395
+ "acc_norm_stderr": 0.019206606848825365
396
+ },
397
+ "harness|hendrycksTest-public_relations|5": {
398
+ "acc": 0.6909090909090909,
399
+ "acc_stderr": 0.044262946482000985,
400
+ "acc_norm": 0.6909090909090909,
401
+ "acc_norm_stderr": 0.044262946482000985
402
+ },
403
+ "harness|hendrycksTest-security_studies|5": {
404
+ "acc": 0.7306122448979592,
405
+ "acc_stderr": 0.02840125202902294,
406
+ "acc_norm": 0.7306122448979592,
407
+ "acc_norm_stderr": 0.02840125202902294
408
+ },
409
+ "harness|hendrycksTest-sociology|5": {
410
+ "acc": 0.8159203980099502,
411
+ "acc_stderr": 0.027403859410786862,
412
+ "acc_norm": 0.8159203980099502,
413
+ "acc_norm_stderr": 0.027403859410786862
414
+ },
415
+ "harness|hendrycksTest-us_foreign_policy|5": {
416
+ "acc": 0.84,
417
+ "acc_stderr": 0.03684529491774708,
418
+ "acc_norm": 0.84,
419
+ "acc_norm_stderr": 0.03684529491774708
420
+ },
421
+ "harness|hendrycksTest-virology|5": {
422
+ "acc": 0.4578313253012048,
423
+ "acc_stderr": 0.0387862677100236,
424
+ "acc_norm": 0.4578313253012048,
425
+ "acc_norm_stderr": 0.0387862677100236
426
+ },
427
+ "harness|hendrycksTest-world_religions|5": {
428
+ "acc": 0.8070175438596491,
429
+ "acc_stderr": 0.030267457554898458,
430
+ "acc_norm": 0.8070175438596491,
431
+ "acc_norm_stderr": 0.030267457554898458
432
+ },
433
+ "harness|truthfulqa:mc|0": {
434
+ "mc1": 0.28886168910648713,
435
+ "mc1_stderr": 0.01586634640138431,
436
+ "mc2": 0.42839602626744816,
437
+ "mc2_stderr": 0.014270024501714959
438
+ },
439
+ "harness|winogrande|5": {
440
+ "acc": 0.7490134175217048,
441
+ "acc_stderr": 0.012185776220516148
442
+ },
443
+ "harness|gsm8k|5": {
444
+ "acc": 0.2926459438968916,
445
+ "acc_stderr": 0.012532334368242888
446
+ }
447
+ }
448
  ```