File size: 142,944 Bytes
fda0efa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ac6wadk3rmkK"
      },
      "source": [
        "# LM Evaluation Harness (by [EleutherAI](https://www.eleuther.ai/))\n",
        "\n",
        "This [`LM-Evaluation-Harness`](https://github.com/EleutherAI/lm-evaluation-harness) provides a unified framework to test generative language models on a large number of different evaluation tasks. For a complete list of available tasks, see the [task table](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md), or scroll to the bottom of the page.\n",
        "\n",
        "1. Clone the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and install the necessary libraries (`sentencepiece` is required for the Llama tokenizer)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UA5I86u91e0A",
        "outputId": "d74b3cab-b292-43db-bd5d-523424d2c97a"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Cloning into 'lm-evaluation-harness'...\n",
            "remote: Enumerating objects: 22343, done.\u001b[K\n",
            "remote: Counting objects: 100% (7096/7096), done.\u001b[K\n",
            "remote: Compressing objects: 100% (703/703), done.\u001b[K\n",
            "remote: Total 22343 (delta 6540), reused 6659 (delta 6392), pack-reused 15247\u001b[K\n",
            "Receiving objects: 100% (22343/22343), 20.57 MiB | 11.37 MiB/s, done.\n",
            "Resolving deltas: 100% (15456/15456), done.\n",
            "Obtaining file:///content/lm-evaluation-harness\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting datasets>=2.0.0 (from lm-eval==0.3.0)\n",
            "  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m519.6/519.6 kB\u001b[0m \u001b[31m8.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting einops (from lm-eval==0.3.0)\n",
            "  Downloading einops-0.7.0-py3-none-any.whl (44 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.6/44.6 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting jsonlines (from lm-eval==0.3.0)\n",
            "  Downloading jsonlines-4.0.0-py3-none-any.whl (8.7 kB)\n",
            "Requirement already satisfied: numexpr in /usr/local/lib/python3.10/dist-packages (from lm-eval==0.3.0) (2.8.7)\n",
            "Collecting openai>=0.6.4 (from lm-eval==0.3.0)\n",
            "  Downloading openai-0.28.1-py3-none-any.whl (76 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.0/77.0 kB\u001b[0m \u001b[31m10.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting omegaconf>=2.2 (from lm-eval==0.3.0)\n",
            "  Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m9.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting peft>=0.2.0 (from lm-eval==0.3.0)\n",
            "  Downloading peft-0.5.0-py3-none-any.whl (85 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m85.6/85.6 kB\u001b[0m \u001b[31m11.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pybind11>=2.6.2 (from lm-eval==0.3.0)\n",
            "  Downloading pybind11-2.11.1-py3-none-any.whl (227 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m227.7/227.7 kB\u001b[0m \u001b[31m26.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pycountry (from lm-eval==0.3.0)\n",
            "  Downloading pycountry-22.3.5.tar.gz (10.1 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.1/10.1 MB\u001b[0m \u001b[31m85.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting pytablewriter (from lm-eval==0.3.0)\n",
            "  Downloading pytablewriter-1.2.0-py3-none-any.whl (111 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m111.1/111.1 kB\u001b[0m \u001b[31m14.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting rouge-score>=0.0.4 (from lm-eval==0.3.0)\n",
            "  Downloading rouge_score-0.1.2.tar.gz (17 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting sacrebleu==1.5.0 (from lm-eval==0.3.0)\n",
            "  Downloading sacrebleu-1.5.0-py3-none-any.whl (65 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m65.6/65.6 kB\u001b[0m \u001b[31m9.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: scikit-learn>=0.24.1 in /usr/local/lib/python3.10/dist-packages (from lm-eval==0.3.0) (1.2.2)\n",
            "Collecting sqlitedict (from lm-eval==0.3.0)\n",
            "  Downloading sqlitedict-2.1.0.tar.gz (21 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: torch>=1.7 in /usr/local/lib/python3.10/dist-packages (from lm-eval==0.3.0) (2.0.1+cu118)\n",
            "Collecting tqdm-multiprocess (from lm-eval==0.3.0)\n",
            "  Downloading tqdm_multiprocess-0.0.11-py3-none-any.whl (9.8 kB)\n",
            "Collecting transformers>=4.1 (from lm-eval==0.3.0)\n",
            "  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.7/7.7 MB\u001b[0m \u001b[31m63.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting zstandard (from lm-eval==0.3.0)\n",
            "  Downloading zstandard-0.21.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m85.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting accelerate>=0.17.1 (from lm-eval==0.3.0)\n",
            "  Downloading accelerate-0.23.0-py3-none-any.whl (258 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m258.1/258.1 kB\u001b[0m \u001b[31m25.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting portalocker (from sacrebleu==1.5.0->lm-eval==0.3.0)\n",
            "  Downloading portalocker-2.8.2-py3-none-any.whl (17 kB)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.17.1->lm-eval==0.3.0) (1.23.5)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.17.1->lm-eval==0.3.0) (23.2)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.17.1->lm-eval==0.3.0) (5.9.5)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.17.1->lm-eval==0.3.0) (6.0.1)\n",
            "Collecting huggingface-hub (from accelerate>=0.17.1->lm-eval==0.3.0)\n",
            "  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.0/302.0 kB\u001b[0m \u001b[31m27.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==0.3.0) (9.0.0)\n",
            "Collecting dill<0.3.8,>=0.3.0 (from datasets>=2.0.0->lm-eval==0.3.0)\n",
            "  Downloading dill-0.3.7-py3-none-any.whl (115 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==0.3.0) (1.5.3)\n",
            "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==0.3.0) (2.31.0)\n",
            "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==0.3.0) (4.66.1)\n",
            "Collecting xxhash (from datasets>=2.0.0->lm-eval==0.3.0)\n",
            "  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m21.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting multiprocess (from datasets>=2.0.0->lm-eval==0.3.0)\n",
            "  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: fsspec[http]<2023.9.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==0.3.0) (2023.6.0)\n",
            "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==0.3.0) (3.8.6)\n",
            "Collecting antlr4-python3-runtime==4.9.* (from omegaconf>=2.2->lm-eval==0.3.0)\n",
            "  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m15.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting safetensors (from peft>=0.2.0->lm-eval==0.3.0)\n",
            "  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m66.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from rouge-score>=0.0.4->lm-eval==0.3.0) (1.4.0)\n",
            "Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from rouge-score>=0.0.4->lm-eval==0.3.0) (3.8.1)\n",
            "Requirement already satisfied: six>=1.14.0 in /usr/local/lib/python3.10/dist-packages (from rouge-score>=0.0.4->lm-eval==0.3.0) (1.16.0)\n",
            "Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.1->lm-eval==0.3.0) (1.11.3)\n",
            "Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.1->lm-eval==0.3.0) (1.3.2)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.1->lm-eval==0.3.0) (3.2.0)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.7->lm-eval==0.3.0) (3.12.4)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.7->lm-eval==0.3.0) (4.5.0)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.7->lm-eval==0.3.0) (1.12)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.7->lm-eval==0.3.0) (3.1)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.7->lm-eval==0.3.0) (3.1.2)\n",
            "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.7->lm-eval==0.3.0) (2.0.0)\n",
            "Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.7->lm-eval==0.3.0) (3.27.6)\n",
            "Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.7->lm-eval==0.3.0) (17.0.2)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.1->lm-eval==0.3.0) (2023.6.3)\n",
            "Collecting tokenizers<0.15,>=0.14 (from transformers>=4.1->lm-eval==0.3.0)\n",
            "  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.8/3.8 MB\u001b[0m \u001b[31m118.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonlines->lm-eval==0.3.0) (23.1.0)\n",
            "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from pycountry->lm-eval==0.3.0) (67.7.2)\n",
            "Collecting DataProperty<2,>=1.0.1 (from pytablewriter->lm-eval==0.3.0)\n",
            "  Downloading DataProperty-1.0.1-py3-none-any.whl (27 kB)\n",
            "Collecting mbstrdecoder<2,>=1.0.0 (from pytablewriter->lm-eval==0.3.0)\n",
            "  Downloading mbstrdecoder-1.1.3-py3-none-any.whl (7.8 kB)\n",
            "Collecting pathvalidate<4,>=2.3.0 (from pytablewriter->lm-eval==0.3.0)\n",
            "  Downloading pathvalidate-3.2.0-py3-none-any.whl (23 kB)\n",
            "Collecting tabledata<2,>=1.3.1 (from pytablewriter->lm-eval==0.3.0)\n",
            "  Downloading tabledata-1.3.3-py3-none-any.whl (11 kB)\n",
            "Collecting tcolorpy<1,>=0.0.5 (from pytablewriter->lm-eval==0.3.0)\n",
            "  Downloading tcolorpy-0.1.4-py3-none-any.whl (7.9 kB)\n",
            "Collecting typepy[datetime]<2,>=1.3.2 (from pytablewriter->lm-eval==0.3.0)\n",
            "  Downloading typepy-1.3.2-py3-none-any.whl (31 kB)\n",
            "Collecting colorama (from tqdm-multiprocess->lm-eval==0.3.0)\n",
            "  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n",
            "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==0.3.0) (3.3.0)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==0.3.0) (6.0.4)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==0.3.0) (4.0.3)\n",
            "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==0.3.0) (1.9.2)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==0.3.0) (1.4.0)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==0.3.0) (1.3.1)\n",
            "Requirement already satisfied: chardet<6,>=3.0.4 in /usr/local/lib/python3.10/dist-packages (from mbstrdecoder<2,>=1.0.0->pytablewriter->lm-eval==0.3.0) (5.2.0)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.0.0->lm-eval==0.3.0) (3.4)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.0.0->lm-eval==0.3.0) (2.0.6)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.0.0->lm-eval==0.3.0) (2023.7.22)\n",
            "Collecting huggingface-hub (from accelerate>=0.17.1->lm-eval==0.3.0)\n",
            "  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m295.0/295.0 kB\u001b[0m \u001b[31m34.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.8.0 in /usr/local/lib/python3.10/dist-packages (from typepy[datetime]<2,>=1.3.2->pytablewriter->lm-eval==0.3.0) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2018.9 in /usr/local/lib/python3.10/dist-packages (from typepy[datetime]<2,>=1.3.2->pytablewriter->lm-eval==0.3.0) (2023.3.post1)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.7->lm-eval==0.3.0) (2.1.3)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->rouge-score>=0.0.4->lm-eval==0.3.0) (8.1.7)\n",
            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.7->lm-eval==0.3.0) (1.3.0)\n",
            "Building wheels for collected packages: antlr4-python3-runtime, rouge-score, pycountry, sqlitedict\n",
            "  Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=a2f0b8953193e72a5cc4d402cd57becdaf2e11c29b664a7bc1dd0a2be7b14c34\n",
            "  Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88\n",
            "  Building wheel for rouge-score (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24932 sha256=b800533290e8b115b69386f5528faaeec21bdaf0b27df954f91293ce884d2fae\n",
            "  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n",
            "  Building wheel for pycountry (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for pycountry: filename=pycountry-22.3.5-py2.py3-none-any.whl size=10681833 sha256=c76dd8d8880795167eba1833e4b4f85fd1d2989d3e3c2a3c14ac581d784ec607\n",
            "  Stored in directory: /root/.cache/pip/wheels/03/57/cc/290c5252ec97a6d78d36479a3c5e5ecc76318afcb241ad9dbe\n",
            "  Building wheel for sqlitedict (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for sqlitedict: filename=sqlitedict-2.1.0-py3-none-any.whl size=16864 sha256=38ab29686a73c7df8c33252ed6b8986475d0548f9adf841373e4ecaf8d995201\n",
            "  Stored in directory: /root/.cache/pip/wheels/79/d6/e7/304e0e6cb2221022c26d8161f7c23cd4f259a9e41e8bbcfabd\n",
            "Successfully built antlr4-python3-runtime rouge-score pycountry sqlitedict\n",
            "Installing collected packages: sqlitedict, antlr4-python3-runtime, zstandard, xxhash, tcolorpy, safetensors, pycountry, pybind11, portalocker, pathvalidate, omegaconf, mbstrdecoder, jsonlines, einops, dill, colorama, typepy, tqdm-multiprocess, sacrebleu, rouge-score, multiprocess, huggingface-hub, tokenizers, openai, transformers, datasets, DataProperty, tabledata, pytablewriter, accelerate, peft, lm-eval\n",
            "  Running setup.py develop for lm-eval\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "llmx 0.0.15a0 requires cohere, which is not installed.\n",
            "llmx 0.0.15a0 requires tiktoken, which is not installed.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed DataProperty-1.0.1 accelerate-0.23.0 antlr4-python3-runtime-4.9.3 colorama-0.4.6 datasets-2.14.5 dill-0.3.7 einops-0.7.0 huggingface-hub-0.17.3 jsonlines-4.0.0 lm-eval-0.3.0 mbstrdecoder-1.1.3 multiprocess-0.70.15 omegaconf-2.3.0 openai-0.28.1 pathvalidate-3.2.0 peft-0.5.0 portalocker-2.8.2 pybind11-2.11.1 pycountry-22.3.5 pytablewriter-1.2.0 rouge-score-0.1.2 sacrebleu-1.5.0 safetensors-0.4.0 sqlitedict-2.1.0 tabledata-1.3.3 tcolorpy-0.1.4 tokenizers-0.14.1 tqdm-multiprocess-0.0.11 transformers-4.34.0 typepy-1.3.2 xxhash-3.4.1 zstandard-0.21.0\n",
            "Collecting cohere\n",
            "  Downloading cohere-4.30-py3-none-any.whl (47 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m47.8/47.8 kB\u001b[0m \u001b[31m1.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting tiktoken\n",
            "  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m30.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting sentencepiece\n",
            "  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m75.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: aiohttp<4.0,>=3.0 in /usr/local/lib/python3.10/dist-packages (from cohere) (3.8.6)\n",
            "Collecting backoff<3.0,>=2.0 (from cohere)\n",
            "  Downloading backoff-2.2.1-py3-none-any.whl (15 kB)\n",
            "Collecting fastavro==1.8.2 (from cohere)\n",
            "  Downloading fastavro-1.8.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.7/2.7 MB\u001b[0m \u001b[31m97.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: importlib_metadata<7.0,>=6.0 in /usr/local/lib/python3.10/dist-packages (from cohere) (6.8.0)\n",
            "Requirement already satisfied: requests<3.0.0,>=2.25.0 in /usr/local/lib/python3.10/dist-packages (from cohere) (2.31.0)\n",
            "Requirement already satisfied: urllib3<3,>=1.26 in /usr/local/lib/python3.10/dist-packages (from cohere) (2.0.6)\n",
            "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken) (2023.6.3)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (23.1.0)\n",
            "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (3.3.0)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (6.0.4)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (4.0.3)\n",
            "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (1.9.2)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (1.4.0)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0,>=3.0->cohere) (1.3.1)\n",
            "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib_metadata<7.0,>=6.0->cohere) (3.17.0)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.25.0->cohere) (3.4)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.25.0->cohere) (2023.7.22)\n",
            "Installing collected packages: sentencepiece, fastavro, backoff, tiktoken, cohere\n",
            "Successfully installed backoff-2.2.1 cohere-4.30 fastavro-1.8.2 sentencepiece-0.1.99 tiktoken-0.5.1\n"
          ]
        }
      ],
      "source": [
        "%git clone https://github.com/EleutherAI/lm-evaluation-harness\n",
        "%cd lm-evaluation-harness && pip install -e .\n",
        "%pip install cohere tiktoken sentencepiece"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pnHoAVK25QZn",
        "outputId": "4253b115-702c-4f31-f1b3-f0483c527841"
      },
      "outputs": [],
      "source": [
        "%cd lm-evaluation-harness && python main.py \\\n",
        "    --model hf-causal \\\n",
        "    --model_args pretrained=nicholasKluge/Aira-2-1B1 \\\n",
        "    --tasks hendrycksTest-abstract_algebra,hendrycksTest-anatomy,hendrycksTest-astronomy,hendrycksTest-business_ethics,hendrycksTest-clinical_knowledge,hendrycksTest-college_biology,hendrycksTest-college_chemistry,hendrycksTest-college_computer_science,hendrycksTest-college_mathematics,hendrycksTest-college_medicine,hendrycksTest-college_physics,hendrycksTest-computer_security,hendrycksTest-conceptual_physics,hendrycksTest-econometrics,hendrycksTest-electrical_engineering,hendrycksTest-elementary_mathematics,hendrycksTest-formal_logic,hendrycksTest-global_facts,hendrycksTest-high_school_biology,hendrycksTest-high_school_chemistry,hendrycksTest-high_school_computer_science,hendrycksTest-high_school_european_history,hendrycksTest-high_school_geography,hendrycksTest-high_school_government_and_politics,hendrycksTest-high_school_macroeconomics,hendrycksTest-high_school_mathematics,hendrycksTest-high_school_microeconomics,hendrycksTest-high_school_physics,hendrycksTest-high_school_psychology,hendrycksTest-high_school_statistics,hendrycksTest-high_school_us_history,hendrycksTest-high_school_world_history,hendrycksTest-human_aging,hendrycksTest-human_sexuality,hendrycksTest-international_law,hendrycksTest-jurisprudence,hendrycksTest-logical_fallacies,hendrycksTest-machine_learning,hendrycksTest-management,hendrycksTest-marketing,hendrycksTest-medical_genetics,hendrycksTest-miscellaneous,hendrycksTest-moral_disputes,hendrycksTest-moral_scenarios,hendrycksTest-nutrition,hendrycksTest-philosophy,hendrycksTest-prehistory,hendrycksTest-professional_accounting,hendrycksTest-professional_law,hendrycksTest-professional_medicine,hendrycksTest-professional_psychology,hendrycksTest-public_relations,hendrycksTest-security_studies,hendrycksTest-sociology,hendrycksTest-us_foreign_policy,hendrycksTest-virology,hendrycksTest-world_religions  \\\n",
        "    --device cuda:0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4Bm78wiZ4Own"
      },
      "source": [
        "## Task Table πŸ“š\n",
        "\n",
        "|                        Task Name                        |Train|Val|Test|Val/Test Docs|                                                                                     Metrics                                                                                     |\n",
        "|---------------------------------------------------------|-----|---|----|------------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
        "|anagrams1                                                |     |βœ“  |    |        10000|acc                                                                                                                                                                              |\n",
        "|anagrams2                                                |     |βœ“  |    |        10000|acc                                                                                                                                                                              |\n",
        "|anli_r1                                                  |βœ“    |βœ“  |βœ“   |         1000|acc                                                                                                                                                                              |\n",
        "|anli_r2                                                  |βœ“    |βœ“  |βœ“   |         1000|acc                                                                                                                                                                              |\n",
        "|anli_r3                                                  |βœ“    |βœ“  |βœ“   |         1200|acc                                                                                                                                                                              |\n",
        "|arc_challenge                                            |βœ“    |βœ“  |βœ“   |         1172|acc, acc_norm                                                                                                                                                                    |\n",
        "|arc_easy                                                 |βœ“    |βœ“  |βœ“   |         2376|acc, acc_norm                                                                                                                                                                    |\n",
        "|arithmetic_1dc                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_2da                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_2dm                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_2ds                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_3da                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_3ds                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_4da                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_4ds                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_5da                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|arithmetic_5ds                                           |     |βœ“  |    |         2000|acc                                                                                                                                                                              |\n",
        "|bigbench_causal_judgement                                |     |   |βœ“   |          190|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_date_understanding                              |     |   |βœ“   |          369|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_disambiguation_qa                               |     |   |βœ“   |          258|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_dyck_languages                                  |     |   |βœ“   |         1000|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_formal_fallacies_syllogisms_negation            |     |   |βœ“   |        14200|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_geometric_shapes                                |     |   |βœ“   |          359|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_hyperbaton                                      |     |   |βœ“   |        50000|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_logical_deduction_five_objects                  |     |   |βœ“   |          500|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_logical_deduction_seven_objects                 |     |   |βœ“   |          700|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_logical_deduction_three_objects                 |     |   |βœ“   |          300|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_movie_recommendation                            |     |   |βœ“   |          500|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_navigate                                        |     |   |βœ“   |         1000|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_reasoning_about_colored_objects                 |     |   |βœ“   |         2000|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_ruin_names                                      |     |   |βœ“   |          448|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_salient_translation_error_detection             |     |   |βœ“   |          998|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_snarks                                          |     |   |βœ“   |          181|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_sports_understanding                            |     |   |βœ“   |          986|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_temporal_sequences                              |     |   |βœ“   |         1000|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_tracking_shuffled_objects_five_objects          |     |   |βœ“   |         1250|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_tracking_shuffled_objects_seven_objects         |     |   |βœ“   |         1750|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|bigbench_tracking_shuffled_objects_three_objects         |     |   |βœ“   |          300|multiple_choice_grade, exact_str_match                                                                                                                                           |\n",
        "|blimp_adjunct_island                                     |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_anaphor_gender_agreement                           |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_anaphor_number_agreement                           |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_animate_subject_passive                            |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_animate_subject_trans                              |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_causative                                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_complex_NP_island                                  |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_coordinate_structure_constraint_complex_left_branch|     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_coordinate_structure_constraint_object_extraction  |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_1                        |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_2                        |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_irregular_1              |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_irregular_2              |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_with_adj_2               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_with_adj_irregular_1     |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_with_adj_irregular_2     |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_determiner_noun_agreement_with_adjective_1         |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_distractor_agreement_relational_noun               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_distractor_agreement_relative_clause               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_drop_argument                                      |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_ellipsis_n_bar_1                                   |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_ellipsis_n_bar_2                                   |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_existential_there_object_raising                   |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_existential_there_quantifiers_1                    |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_existential_there_quantifiers_2                    |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_existential_there_subject_raising                  |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_expletive_it_object_raising                        |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_inchoative                                         |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_intransitive                                       |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_irregular_past_participle_adjectives               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_irregular_past_participle_verbs                    |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_irregular_plural_subject_verb_agreement_1          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_irregular_plural_subject_verb_agreement_2          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_left_branch_island_echo_question                   |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_left_branch_island_simple_question                 |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_matrix_question_npi_licensor_present               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_npi_present_1                                      |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_npi_present_2                                      |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_only_npi_licensor_present                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_only_npi_scope                                     |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_passive_1                                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_passive_2                                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_c_command                              |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_case_1                                 |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_case_2                                 |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_domain_1                               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_domain_2                               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_domain_3                               |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_principle_A_reconstruction                         |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_regular_plural_subject_verb_agreement_1            |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_regular_plural_subject_verb_agreement_2            |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_sentential_negation_npi_licensor_present           |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_sentential_negation_npi_scope                      |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_sentential_subject_island                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_superlative_quantifiers_1                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_superlative_quantifiers_2                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_tough_vs_raising_1                                 |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_tough_vs_raising_2                                 |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_transitive                                         |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_island                                          |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_questions_object_gap                            |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_questions_subject_gap                           |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_questions_subject_gap_long_distance             |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_vs_that_no_gap                                  |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_vs_that_no_gap_long_distance                    |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_vs_that_with_gap                                |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|blimp_wh_vs_that_with_gap_long_distance                  |     |βœ“  |    |         1000|acc                                                                                                                                                                              |\n",
        "|boolq                                                    |βœ“    |βœ“  |    |         3270|acc                                                                                                                                                                              |\n",
        "|cb                                                       |βœ“    |βœ“  |    |           56|acc, f1                                                                                                                                                                          |\n",
        "|cola                                                     |βœ“    |βœ“  |    |         1043|mcc                                                                                                                                                                              |\n",
        "|copa                                                     |βœ“    |βœ“  |    |          100|acc                                                                                                                                                                              |\n",
        "|coqa                                                     |βœ“    |βœ“  |    |          500|f1, em                                                                                                                                                                           |\n",
        "|crows_pairs_english                                      |     |βœ“  |    |         1677|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_age                                  |     |βœ“  |    |           91|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_autre                                |     |βœ“  |    |           11|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_disability                           |     |βœ“  |    |           65|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_gender                               |     |βœ“  |    |          320|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_nationality                          |     |βœ“  |    |          216|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_physical_appearance                  |     |βœ“  |    |           72|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_race_color                           |     |βœ“  |    |          508|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_religion                             |     |βœ“  |    |          111|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_sexual_orientation                   |     |βœ“  |    |           93|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_english_socioeconomic                        |     |βœ“  |    |          190|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french                                       |     |βœ“  |    |         1677|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_age                                   |     |βœ“  |    |           90|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_autre                                 |     |βœ“  |    |           13|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_disability                            |     |βœ“  |    |           66|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_gender                                |     |βœ“  |    |          321|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_nationality                           |     |βœ“  |    |          253|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_physical_appearance                   |     |βœ“  |    |           72|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_race_color                            |     |βœ“  |    |          460|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_religion                              |     |βœ“  |    |          115|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_sexual_orientation                    |     |βœ“  |    |           91|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|crows_pairs_french_socioeconomic                         |     |βœ“  |    |          196|likelihood_difference, pct_stereotype                                                                                                                                            |\n",
        "|cycle_letters                                            |     |βœ“  |    |        10000|acc                                                                                                                                                                              |\n",
        "|drop                                                     |βœ“    |βœ“  |    |         9536|em, f1                                                                                                                                                                           |\n",
        "|ethics_cm                                                |βœ“    |   |βœ“   |         3885|acc                                                                                                                                                                              |\n",
        "|ethics_deontology                                        |βœ“    |   |βœ“   |         3596|acc, em                                                                                                                                                                          |\n",
        "|ethics_justice                                           |βœ“    |   |βœ“   |         2704|acc, em                                                                                                                                                                          |\n",
        "|ethics_utilitarianism                                    |βœ“    |   |βœ“   |         4808|acc                                                                                                                                                                              |\n",
        "|ethics_utilitarianism_original                           |     |   |βœ“   |         4808|acc                                                                                                                                                                              |\n",
        "|ethics_virtue                                            |βœ“    |   |βœ“   |         4975|acc, em                                                                                                                                                                          |\n",
        "|gsm8k                                                    |βœ“    |   |βœ“   |         1319|acc                                                                                                                                                                              |\n",
        "|headqa                                                   |βœ“    |βœ“  |βœ“   |         2742|acc, acc_norm                                                                                                                                                                    |\n",
        "|headqa_en                                                |βœ“    |βœ“  |βœ“   |         2742|acc, acc_norm                                                                                                                                                                    |\n",
        "|headqa_es                                                |βœ“    |βœ“  |βœ“   |         2742|acc, acc_norm                                                                                                                                                                    |\n",
        "|hellaswag                                                |βœ“    |βœ“  |    |        10042|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-abstract_algebra                           |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-anatomy                                    |     |βœ“  |βœ“   |          135|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-astronomy                                  |     |βœ“  |βœ“   |          152|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-business_ethics                            |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-clinical_knowledge                         |     |βœ“  |βœ“   |          265|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-college_biology                            |     |βœ“  |βœ“   |          144|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-college_chemistry                          |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-college_computer_science                   |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-college_mathematics                        |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-college_medicine                           |     |βœ“  |βœ“   |          173|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-college_physics                            |     |βœ“  |βœ“   |          102|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-computer_security                          |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-conceptual_physics                         |     |βœ“  |βœ“   |          235|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-econometrics                               |     |βœ“  |βœ“   |          114|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-electrical_engineering                     |     |βœ“  |βœ“   |          145|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-elementary_mathematics                     |     |βœ“  |βœ“   |          378|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-formal_logic                               |     |βœ“  |βœ“   |          126|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-global_facts                               |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_biology                        |     |βœ“  |βœ“   |          310|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_chemistry                      |     |βœ“  |βœ“   |          203|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_computer_science               |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_european_history               |     |βœ“  |βœ“   |          165|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_geography                      |     |βœ“  |βœ“   |          198|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_government_and_politics        |     |βœ“  |βœ“   |          193|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_macroeconomics                 |     |βœ“  |βœ“   |          390|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_mathematics                    |     |βœ“  |βœ“   |          270|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_microeconomics                 |     |βœ“  |βœ“   |          238|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_physics                        |     |βœ“  |βœ“   |          151|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_psychology                     |     |βœ“  |βœ“   |          545|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_statistics                     |     |βœ“  |βœ“   |          216|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_us_history                     |     |βœ“  |βœ“   |          204|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-high_school_world_history                  |     |βœ“  |βœ“   |          237|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-human_aging                                |     |βœ“  |βœ“   |          223|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-human_sexuality                            |     |βœ“  |βœ“   |          131|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-international_law                          |     |βœ“  |βœ“   |          121|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-jurisprudence                              |     |βœ“  |βœ“   |          108|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-logical_fallacies                          |     |βœ“  |βœ“   |          163|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-machine_learning                           |     |βœ“  |βœ“   |          112|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-management                                 |     |βœ“  |βœ“   |          103|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-marketing                                  |     |βœ“  |βœ“   |          234|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-medical_genetics                           |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-miscellaneous                              |     |βœ“  |βœ“   |          783|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-moral_disputes                             |     |βœ“  |βœ“   |          346|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-moral_scenarios                            |     |βœ“  |βœ“   |          895|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-nutrition                                  |     |βœ“  |βœ“   |          306|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-philosophy                                 |     |βœ“  |βœ“   |          311|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-prehistory                                 |     |βœ“  |βœ“   |          324|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-professional_accounting                    |     |βœ“  |βœ“   |          282|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-professional_law                           |     |βœ“  |βœ“   |         1534|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-professional_medicine                      |     |βœ“  |βœ“   |          272|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-professional_psychology                    |     |βœ“  |βœ“   |          612|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-public_relations                           |     |βœ“  |βœ“   |          110|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-security_studies                           |     |βœ“  |βœ“   |          245|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-sociology                                  |     |βœ“  |βœ“   |          201|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-us_foreign_policy                          |     |βœ“  |βœ“   |          100|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-virology                                   |     |βœ“  |βœ“   |          166|acc, acc_norm                                                                                                                                                                    |\n",
        "|hendrycksTest-world_religions                            |     |βœ“  |βœ“   |          171|acc, acc_norm                                                                                                                                                                    |\n",
        "|iwslt17-ar-en                                            |     |   |βœ“   |         1460|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|iwslt17-en-ar                                            |     |   |βœ“   |         1460|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|lambada_openai                                           |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_openai_cloze                                     |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_openai_mt_de                                     |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_openai_mt_en                                     |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_openai_mt_es                                     |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_openai_mt_fr                                     |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_openai_mt_it                                     |     |   |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_standard                                         |     |βœ“  |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|lambada_standard_cloze                                   |     |βœ“  |βœ“   |         5153|ppl, acc                                                                                                                                                                         |\n",
        "|logiqa                                                   |βœ“    |βœ“  |βœ“   |          651|acc, acc_norm                                                                                                                                                                    |\n",
        "|math_algebra                                             |βœ“    |   |βœ“   |         1187|acc                                                                                                                                                                              |\n",
        "|math_asdiv                                               |     |βœ“  |    |         2305|acc                                                                                                                                                                              |\n",
        "|math_counting_and_prob                                   |βœ“    |   |βœ“   |          474|acc                                                                                                                                                                              |\n",
        "|math_geometry                                            |βœ“    |   |βœ“   |          479|acc                                                                                                                                                                              |\n",
        "|math_intermediate_algebra                                |βœ“    |   |βœ“   |          903|acc                                                                                                                                                                              |\n",
        "|math_num_theory                                          |βœ“    |   |βœ“   |          540|acc                                                                                                                                                                              |\n",
        "|math_prealgebra                                          |βœ“    |   |βœ“   |          871|acc                                                                                                                                                                              |\n",
        "|math_precalc                                             |βœ“    |   |βœ“   |          546|acc                                                                                                                                                                              |\n",
        "|mathqa                                                   |βœ“    |βœ“  |βœ“   |         2985|acc, acc_norm                                                                                                                                                                    |\n",
        "|mc_taco                                                  |     |βœ“  |βœ“   |         9442|f1, em                                                                                                                                                                           |\n",
        "|mgsm_bn                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_de                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_en                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_es                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_fr                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_ja                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_ru                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_sw                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_te                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_th                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mgsm_zh                                                  |βœ“    |   |βœ“   |          250|acc                                                                                                                                                                              |\n",
        "|mnli                                                     |βœ“    |βœ“  |    |         9815|acc                                                                                                                                                                              |\n",
        "|mnli_mismatched                                          |βœ“    |βœ“  |    |         9832|acc                                                                                                                                                                              |\n",
        "|mrpc                                                     |βœ“    |βœ“  |    |          408|acc, f1                                                                                                                                                                          |\n",
        "|multirc                                                  |βœ“    |βœ“  |    |         4848|acc                                                                                                                                                                              |\n",
        "|mutual                                                   |βœ“    |βœ“  |    |          886|r@1, r@2, mrr                                                                                                                                                                    |\n",
        "|mutual_plus                                              |βœ“    |βœ“  |    |          886|r@1, r@2, mrr                                                                                                                                                                    |\n",
        "|openbookqa                                               |βœ“    |βœ“  |βœ“   |          500|acc, acc_norm                                                                                                                                                                    |\n",
        "|pawsx_de                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pawsx_en                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pawsx_es                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pawsx_fr                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pawsx_ja                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pawsx_ko                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pawsx_zh                                                 |βœ“    |βœ“  |βœ“   |         2000|acc                                                                                                                                                                              |\n",
        "|pile_arxiv                                               |     |βœ“  |βœ“   |         2407|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_bookcorpus2                                         |     |βœ“  |βœ“   |           28|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_books3                                              |     |βœ“  |βœ“   |          269|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_dm-mathematics                                      |     |βœ“  |βœ“   |         1922|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_enron                                               |     |βœ“  |βœ“   |         1010|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_europarl                                            |     |βœ“  |βœ“   |          157|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_freelaw                                             |     |βœ“  |βœ“   |         5101|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_github                                              |     |βœ“  |βœ“   |        18195|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_gutenberg                                           |     |βœ“  |βœ“   |           80|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_hackernews                                          |     |βœ“  |βœ“   |         1632|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_nih-exporter                                        |     |βœ“  |βœ“   |         1884|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_opensubtitles                                       |     |βœ“  |βœ“   |          642|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_openwebtext2                                        |     |βœ“  |βœ“   |        32925|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_philpapers                                          |     |βœ“  |βœ“   |           68|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_pile-cc                                             |     |βœ“  |βœ“   |        52790|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_pubmed-abstracts                                    |     |βœ“  |βœ“   |        29895|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_pubmed-central                                      |     |βœ“  |βœ“   |         5911|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_stackexchange                                       |     |βœ“  |βœ“   |        30378|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_ubuntu-irc                                          |     |βœ“  |βœ“   |           22|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_uspto                                               |     |βœ“  |βœ“   |        11415|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_wikipedia                                           |     |βœ“  |βœ“   |        17511|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|pile_youtubesubtitles                                    |     |βœ“  |βœ“   |          342|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|piqa                                                     |βœ“    |βœ“  |    |         1838|acc, acc_norm                                                                                                                                                                    |\n",
        "|prost                                                    |     |   |βœ“   |        18736|acc, acc_norm                                                                                                                                                                    |\n",
        "|pubmedqa                                                 |     |   |βœ“   |         1000|acc                                                                                                                                                                              |\n",
        "|qa4mre_2011                                              |     |   |βœ“   |          120|acc, acc_norm                                                                                                                                                                    |\n",
        "|qa4mre_2012                                              |     |   |βœ“   |          160|acc, acc_norm                                                                                                                                                                    |\n",
        "|qa4mre_2013                                              |     |   |βœ“   |          284|acc, acc_norm                                                                                                                                                                    |\n",
        "|qasper                                                   |βœ“    |βœ“  |    |         1764|f1_yesno, f1_abstractive                                                                                                                                                         |\n",
        "|qnli                                                     |βœ“    |βœ“  |    |         5463|acc                                                                                                                                                                              |\n",
        "|qqp                                                      |βœ“    |βœ“  |    |        40430|acc, f1                                                                                                                                                                          |\n",
        "|race                                                     |βœ“    |βœ“  |βœ“   |         1045|acc                                                                                                                                                                              |\n",
        "|random_insertion                                         |     |βœ“  |    |        10000|acc                                                                                                                                                                              |\n",
        "|record                                                   |βœ“    |βœ“  |    |        10000|f1, em                                                                                                                                                                           |\n",
        "|reversed_words                                           |     |βœ“  |    |        10000|acc                                                                                                                                                                              |\n",
        "|rte                                                      |βœ“    |βœ“  |    |          277|acc                                                                                                                                                                              |\n",
        "|sciq                                                     |βœ“    |βœ“  |βœ“   |         1000|acc, acc_norm                                                                                                                                                                    |\n",
        "|scrolls_contractnli                                      |βœ“    |βœ“  |    |         1037|em, acc, acc_norm                                                                                                                                                                |\n",
        "|scrolls_govreport                                        |βœ“    |βœ“  |    |          972|rouge1, rouge2, rougeL                                                                                                                                                           |\n",
        "|scrolls_narrativeqa                                      |βœ“    |βœ“  |    |         3425|f1                                                                                                                                                                               |\n",
        "|scrolls_qasper                                           |βœ“    |βœ“  |    |          984|f1                                                                                                                                                                               |\n",
        "|scrolls_qmsum                                            |βœ“    |βœ“  |    |          272|rouge1, rouge2, rougeL                                                                                                                                                           |\n",
        "|scrolls_quality                                          |βœ“    |βœ“  |    |         2086|em, acc, acc_norm                                                                                                                                                                |\n",
        "|scrolls_summscreenfd                                     |βœ“    |βœ“  |    |          338|rouge1, rouge2, rougeL                                                                                                                                                           |\n",
        "|squad2                                                   |βœ“    |βœ“  |    |        11873|exact, f1, HasAns_exact, HasAns_f1, NoAns_exact, NoAns_f1, best_exact, best_f1                                                                                                   |\n",
        "|sst                                                      |βœ“    |βœ“  |    |          872|acc                                                                                                                                                                              |\n",
        "|swag                                                     |βœ“    |βœ“  |    |        20006|acc, acc_norm                                                                                                                                                                    |\n",
        "|toxigen                                                  |βœ“    |   |βœ“   |          940|acc, acc_norm                                                                                                                                                                    |\n",
        "|triviaqa                                                 |βœ“    |βœ“  |    |        11313|acc                                                                                                                                                                              |\n",
        "|truthfulqa_gen                                           |     |βœ“  |    |          817|bleurt_max, bleurt_acc, bleurt_diff, bleu_max, bleu_acc, bleu_diff, rouge1_max, rouge1_acc, rouge1_diff, rouge2_max, rouge2_acc, rouge2_diff, rougeL_max, rougeL_acc, rougeL_diff|\n",
        "|truthfulqa_mc                                            |     |βœ“  |    |          817|mc1, mc2                                                                                                                                                                         |\n",
        "|webqs                                                    |βœ“    |   |βœ“   |         2032|acc                                                                                                                                                                              |\n",
        "|wic                                                      |βœ“    |βœ“  |    |          638|acc                                                                                                                                                                              |\n",
        "|wikitext                                                 |βœ“    |βœ“  |βœ“   |           62|word_perplexity, byte_perplexity, bits_per_byte                                                                                                                                  |\n",
        "|winogrande                                               |βœ“    |βœ“  |    |         1267|acc                                                                                                                                                                              |\n",
        "|wmt14-en-fr                                              |     |   |βœ“   |         3003|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt14-fr-en                                              |     |   |βœ“   |         3003|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt16-de-en                                              |     |   |βœ“   |         2999|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt16-en-de                                              |     |   |βœ“   |         2999|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt16-en-ro                                              |     |   |βœ“   |         1999|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt16-ro-en                                              |     |   |βœ“   |         1999|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-cs-en                                              |     |   |βœ“   |          664|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-de-en                                              |     |   |βœ“   |          785|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-de-fr                                              |     |   |βœ“   |         1619|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-cs                                              |     |   |βœ“   |         1418|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-de                                              |     |   |βœ“   |         1418|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-iu                                              |     |   |βœ“   |         2971|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-ja                                              |     |   |βœ“   |         1000|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-km                                              |     |   |βœ“   |         2320|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-pl                                              |     |   |βœ“   |         1000|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-ps                                              |     |   |βœ“   |         2719|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-ru                                              |     |   |βœ“   |         2002|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-ta                                              |     |   |βœ“   |         1000|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-en-zh                                              |     |   |βœ“   |         1418|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-fr-de                                              |     |   |βœ“   |         1619|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-iu-en                                              |     |   |βœ“   |         2971|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-ja-en                                              |     |   |βœ“   |          993|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-km-en                                              |     |   |βœ“   |         2320|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-pl-en                                              |     |   |βœ“   |         1001|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-ps-en                                              |     |   |βœ“   |         2719|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-ru-en                                              |     |   |βœ“   |          991|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-ta-en                                              |     |   |βœ“   |          997|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wmt20-zh-en                                              |     |   |βœ“   |         2000|bleu, chrf, ter                                                                                                                                                                  |\n",
        "|wnli                                                     |βœ“    |βœ“  |    |           71|acc                                                                                                                                                                              |\n",
        "|wsc                                                      |βœ“    |βœ“  |    |          104|acc                                                                                                                                                                              |\n",
        "|wsc273                                                   |     |   |βœ“   |          273|acc                                                                                                                                                                              |\n",
        "|xcopa_et                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_ht                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_id                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_it                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_qu                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_sw                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_ta                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_th                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_tr                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_vi                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xcopa_zh                                                 |     |βœ“  |βœ“   |          500|acc                                                                                                                                                                              |\n",
        "|xnli_ar                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_bg                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_de                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_el                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_en                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_es                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_fr                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_hi                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_ru                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_sw                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_th                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_tr                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_ur                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_vi                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xnli_zh                                                  |βœ“    |βœ“  |βœ“   |         5010|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_ar                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_en                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_es                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_eu                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_hi                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_id                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_my                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_ru                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_sw                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_te                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xstory_cloze_zh                                          |βœ“    |βœ“  |    |         1511|acc                                                                                                                                                                              |\n",
        "|xwinograd_en                                             |     |   |βœ“   |         2325|acc                                                                                                                                                                              |\n",
        "|xwinograd_fr                                             |     |   |βœ“   |           83|acc                                                                                                                                                                              |\n",
        "|xwinograd_jp                                             |     |   |βœ“   |          959|acc                                                                                                                                                                              |\n",
        "|xwinograd_pt                                             |     |   |βœ“   |          263|acc                                                                                                                                                                              |\n",
        "|xwinograd_ru                                             |     |   |βœ“   |          315|acc                                                                                                                                                                              |\n",
        "|xwinograd_zh                                             |     |   |βœ“   |          504|acc                                                                                                                                                                              |\n",
        "| Ceval-valid-computer_network                         |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-operating_system                         |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-computer_architecture                    |   | βœ“ |   | 21 | acc |\n",
        "| Ceval-valid-college_programming                      |   | βœ“ |   | 37 | acc |\n",
        "| Ceval-valid-college_physics                          |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-college_chemistry                        |   | βœ“ |   | 24 | acc |\n",
        "| Ceval-valid-advanced_mathematics                     |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-probability_and_statistics               |   | βœ“ |   | 18 | acc |\n",
        "| Ceval-valid-discrete_mathematics                     |   | βœ“ |   | 16 | acc |\n",
        "| Ceval-valid-electrical_engineer                      |   | βœ“ |   | 37 | acc |\n",
        "| Ceval-valid-metrology_engineer                       |   | βœ“ |   | 24 | acc |\n",
        "| Ceval-valid-high_school_mathematics                  |   | βœ“ |   | 18 | acc |\n",
        "| Ceval-valid-high_school_physics                      |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-high_school_chemistry                    |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-high_school_biology                      |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-middle_school_mathematics                |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-middle_school_biology                    |   | βœ“ |   | 21 | acc |\n",
        "| Ceval-valid-middle_school_physics                    |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-middle_school_chemistry                  |   | βœ“ |   | 20 | acc |\n",
        "| Ceval-valid-veterinary_medicine                      |   | βœ“ |   | 23 | acc |\n",
        "| Ceval-valid-college_economics                        |   | βœ“ |   | 55 | acc |\n",
        "| Ceval-valid-business_administration                  |   | βœ“ |   | 33 | acc |\n",
        "| Ceval-valid-marxism                                  |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-mao_zedong_thought                       |   | βœ“ |   | 24 | acc |\n",
        "| Ceval-valid-education_science                        |   | βœ“ |   | 29 | acc |\n",
        "| Ceval-valid-teacher_qualification                    |   | βœ“ |   | 44 | acc |\n",
        "| Ceval-valid-high_school_politics                     |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-high_school_geography                    |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-middle_school_politics                   |   | βœ“ |   | 21 | acc |\n",
        "| Ceval-valid-middle_school_geography                  |   | βœ“ |   | 12 | acc |\n",
        "| Ceval-valid-modern_chinese_history                   |   | βœ“ |   | 23 | acc |\n",
        "| Ceval-valid-ideological_and_moral_cultivation        |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-logic                                    |   | βœ“ |   | 22 | acc |\n",
        "| Ceval-valid-law                                      |   | βœ“ |   | 24 | acc |\n",
        "| Ceval-valid-chinese_language_and_literature          |   | βœ“ |   | 23 | acc |\n",
        "| Ceval-valid-art_studies                              |   | βœ“ |   | 33 | acc |\n",
        "| Ceval-valid-professional_tour_guide                  |   | βœ“ |   | 29 | acc |\n",
        "| Ceval-valid-legal_professional                       |   | βœ“ |   | 23 | acc |\n",
        "| Ceval-valid-high_school_chinese                      |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-high_school_history                      |   | βœ“ |   | 20 | acc |\n",
        "| Ceval-valid-middle_school_history                    |   | βœ“ |   | 22 | acc |\n",
        "| Ceval-valid-civil_servant                            |   | βœ“ |   | 47 | acc |\n",
        "| Ceval-valid-sports_science                           |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-plant_protection                         |   | βœ“ |   | 22 | acc |\n",
        "| Ceval-valid-basic_medicine                           |   | βœ“ |   | 19 | acc |\n",
        "| Ceval-valid-clinical_medicine                        |   | βœ“ |   | 22 | acc |\n",
        "| Ceval-valid-urban_and_rural_planner                  |   | βœ“ |   | 46 | acc |\n",
        "| Ceval-valid-accountant                               |   | βœ“ |   | 49 | acc |\n",
        "| Ceval-valid-fire_engineer                            |   | βœ“ |   | 31 | acc |\n",
        "| Ceval-valid-environmental_impact_assessment_engineer |   | βœ“ |   | 31 | acc |\n",
        "| Ceval-valid-tax_accountant                           |   | βœ“ |   | 49 | acc |\n",
        "| Ceval-valid-physician                                |   | βœ“ |   | 49 | acc |"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}