File size: 12,927 Bytes
0723761 172c6a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 |
multiple_choice_score: there are 789 tasks in prompt multiple_choice_score: reading tasks................................................................................................................done multiple_choice_score: preparing task data...done multiple_choice_score : calculating TruthfulQA score over 789 tasks. task acc_norm 1 100.00000000 2 50.00000000 3 33.33333333 4 25.00000000 5 40.00000000 6 33.33333333 7 28.57142857 8 25.00000000 9 22.22222222 10 20.00000000 11 18.18181818 12 16.66666667 13 23.07692308 14 21.42857143 15 20.00000000 16 18.75000000 17 17.64705882 18 22.22222222 19 21.05263158 20 20.00000000 21 19.04761905 22 18.18181818 23 17.39130435 24 16.66666667 25 16.00000000 26 19.23076923 27 18.51851852 28 21.42857143 29 20.68965517 30 23.33333333 31 25.80645161 32 25.00000000 33 24.24242424 34 26.47058824 35 25.71428571 36 25.00000000 37 24.32432432 38 26.31578947 39 28.20512821 40 27.50000000 41 26.82926829 42 26.19047619 43 25.58139535 44 25.00000000 45 24.44444444 46 23.91304348 47 23.40425532 48 25.00000000 49 24.48979592 50 24.00000000 51 23.52941176 52 23.07692308 53 24.52830189 54 24.07407407 55 23.63636364 56 23.21428571 57 24.56140351 58 25.86206897 59 25.42372881 60 25.00000000 61 24.59016393 62 24.19354839 63 23.80952381 64 23.43750000 65 23.07692308 66 22.72727273 67 22.38805970 68 22.05882353 69 21.73913043 70 21.42857143 71 21.12676056 72 20.83333333 73 20.54794521 74 21.62162162 75 22.66666667 76 23.68421053 77 23.37662338 78 23.07692308 79 22.78481013 80 23.75000000 81 23.45679012 82 23.17073171 83 22.89156627 84 22.61904762 85 22.35294118 86 22.09302326 87 21.83908046 88 21.59090909 89 21.34831461 90 21.11111111 91 20.87912088 92 20.65217391 93 20.43010753 94 20.21276596 95 20.00000000 96 19.79166667 97 19.58762887 98 19.38775510 99 19.19191919 100 19.00000000 101 18.81188119 102 18.62745098 103 18.44660194 104 18.26923077 105 18.09523810 106 17.92452830 107 17.75700935 108 17.59259259 109 17.43119266 110 17.27272727 111 17.11711712 112 16.96428571 113 16.81415929 114 16.66666667 115 16.52173913 116 16.37931034 117 17.09401709 118 16.94915254 119 16.80672269 120 17.50000000 121 18.18181818 122 18.03278689 123 17.88617886 124 17.74193548 125 17.60000000 126 18.25396825 127 18.11023622 128 18.75000000 129 19.37984496 130 19.23076923 131 19.08396947 132 18.93939394 133 18.79699248 134 18.65671642 135 18.51851852 136 18.38235294 137 18.24817518 138 18.11594203 139 17.98561151 140 17.85714286 141 17.73049645 142 17.60563380 143 17.48251748 144 17.36111111 145 17.24137931 146 17.12328767 147 17.00680272 148 16.89189189 149 16.77852349 150 16.66666667 151 16.55629139 152 16.44736842 153 16.33986928 154 16.23376623 155 16.12903226 156 16.02564103 157 15.92356688 158 15.82278481 159 15.72327044 160 15.62500000 161 15.52795031 162 16.04938272 163 15.95092025 164 16.46341463 165 16.36363636 166 16.26506024 167 16.16766467 168 16.07142857 169 15.97633136 170 16.47058824 171 16.37426901 172 16.27906977 173 16.18497110 174 16.09195402 175 16.00000000 176 15.90909091 177 15.81920904 178 15.73033708 179 15.64245810 180 15.55555556 181 15.46961326 182 15.38461538 183 15.30054645 184 15.21739130 185 15.67567568 186 15.59139785 187 15.50802139 188 15.42553191 189 15.34391534 190 15.26315789 191 15.18324607 192 15.10416667 193 15.02590674 194 14.94845361 195 14.87179487 196 14.79591837 197 14.72081218 198 15.15151515 199 15.07537688 200 15.00000000 201 14.92537313 202 14.85148515 203 15.27093596 204 15.68627451 205 16.09756098 206 16.01941748 207 15.94202899 208 15.86538462 209 15.78947368 210 16.19047619 211 16.11374408 212 16.03773585 213 15.96244131 214 15.88785047 215 16.27906977 216 16.66666667 217 16.58986175 218 16.97247706 219 16.89497717 220 17.27272727 221 17.19457014 222 17.11711712 223 17.04035874 224 16.96428571 225 16.88888889 226 17.25663717 227 17.18061674 228 17.10526316 229 17.46724891 230 17.39130435 231 17.31601732 232 17.24137931 233 17.16738197 234 17.09401709 235 17.02127660 236 16.94915254 237 17.29957806 238 17.22689076 239 17.15481172 240 17.08333333 241 17.42738589 242 17.35537190 243 17.28395062 244 17.21311475 245 17.14285714 246 17.07317073 247 17.00404858 248 17.33870968 249 17.26907631 250 17.20000000 251 17.52988048 252 17.85714286 253 17.78656126 254 18.11023622 255 18.03921569 256 18.35937500 257 18.67704280 258 18.60465116 259 18.53281853 260 18.46153846 261 18.39080460 262 18.70229008 263 18.63117871 264 18.56060606 265 18.49056604 266 18.79699248 267 18.72659176 268 18.65671642 269 18.58736059 270 18.51851852 271 18.81918819 272 18.75000000 273 18.68131868 274 18.61313869 275 18.54545455 276 18.47826087 277 18.41155235 278 18.70503597 279 18.63799283 280 18.57142857 281 18.86120996 282 18.79432624 283 18.72791519 284 18.66197183 285 18.59649123 286 18.53146853 287 18.46689895 288 18.40277778 289 18.33910035 290 18.62068966 291 18.55670103 292 18.83561644 293 19.11262799 294 19.04761905 295 18.98305085 296 18.91891892 297 18.85521886 298 18.79194631 299 18.72909699 300 18.66666667 301 18.60465116 302 18.54304636 303 18.48184818 304 18.42105263 305 18.68852459 306 18.95424837 307 18.89250814 308 18.83116883 309 19.09385113 310 19.03225806 311 19.29260450 312 19.55128205 313 19.80830671 314 19.74522293 315 20.00000000 316 19.93670886 317 19.87381703 318 19.81132075 319 20.06269592 320 20.00000000 321 20.24922118 322 20.49689441 323 20.43343653 324 20.37037037 325 20.30769231 326 20.55214724 327 20.79510703 328 20.73170732 329 20.66869301 330 20.90909091 331 20.84592145 332 20.78313253 333 20.72072072 334 20.65868263 335 20.59701493 336 20.53571429 337 20.47477745 338 20.41420118 339 20.64896755 340 20.58823529 341 20.52785924 342 20.46783626 343 20.40816327 344 20.34883721 345 20.28985507 346 20.52023121 347 20.46109510 348 20.40229885 349 20.34383954 350 20.28571429 351 20.51282051 352 20.45454545 353 20.39660057 354 20.62146893 355 20.56338028 356 20.50561798 357 20.44817927 358 20.67039106 359 20.89136490 360 21.11111111 361 21.32963989 362 21.27071823 363 21.21212121 364 21.15384615 365 21.09589041 366 21.03825137 367 20.98092643 368 20.92391304 369 20.86720867 370 20.81081081 371 20.75471698 372 20.69892473 373 20.64343164 374 20.58823529 375 20.80000000 376 20.74468085 377 20.68965517 378 20.63492063 379 20.58047493 380 20.52631579 381 20.47244094 382 20.41884817 383 20.36553525 384 20.31250000 385 20.25974026 386 20.20725389 387 20.15503876 388 20.10309278 389 20.05141388 390 20.00000000 391 20.20460358 392 20.15306122 393 20.10178117 394 20.05076142 395 20.00000000 396 19.94949495 397 19.89924433 398 19.84924623 399 19.79949875 400 19.75000000 401 19.95012469 402 19.90049751 403 20.09925558 404 20.29702970 405 20.24691358 406 20.19704433 407 20.14742015 408 20.09803922 409 20.04889976 410 20.00000000 411 19.95133820 412 19.90291262 413 19.85472155 414 19.80676329 415 20.00000000 416 19.95192308 417 19.90407674 418 19.85645933 419 19.80906921 420 19.76190476 421 19.95249406 422 19.90521327 423 20.09456265 424 20.04716981 425 20.23529412 426 20.18779343 427 20.14051522 428 20.32710280 429 20.27972028 430 20.23255814 431 20.18561485 432 20.13888889 433 20.32332564 434 20.27649770 435 20.45977011 436 20.41284404 437 20.36613272 438 20.54794521 439 20.50113895 440 20.68181818 441 20.86167800 442 20.81447964 443 20.76749436 444 20.94594595 445 20.89887640 446 20.85201794 447 20.80536913 448 20.75892857 449 20.71269488 450 20.66666667 451 20.62084257 452 20.57522124 453 20.52980132 454 20.48458150 455 20.65934066 456 20.83333333 457 20.78774617 458 20.74235808 459 20.69716776 460 20.65217391 461 20.60737527 462 20.56277056 463 20.51835853 464 20.47413793 465 20.43010753 466 20.38626609 467 20.34261242 468 20.29914530 469 20.25586354 470 20.21276596 471 20.38216561 472 20.33898305 473 20.29598309 474 20.25316456 475 20.21052632 476 20.16806723 477 20.12578616 478 20.08368201 479 20.04175365 480 20.00000000 481 19.95841996 482 19.91701245 483 19.87577640 484 20.04132231 485 20.20618557 486 20.16460905 487 20.12320329 488 20.08196721 489 20.04089980 490 20.00000000 491 19.95926680 492 20.12195122 493 20.08113590 494 20.24291498 495 20.40404040 496 20.36290323 497 20.32193159 498 20.48192771 499 20.64128257 500 20.60000000 501 20.55888224 502 20.71713147 503 20.67594433 504 20.63492063 505 20.59405941 506 20.55335968 507 20.51282051 508 20.47244094 509 20.43222004 510 20.39215686 511 20.35225049 512 20.31250000 513 20.46783626 514 20.42801556 515 20.58252427 516 20.54263566 517 20.50290135 518 20.46332046 519 20.42389210 520 20.57692308 521 20.53742802 522 20.49808429 523 20.45889101 524 20.41984733 525 20.38095238 526 20.34220532 527 20.30360531 528 20.45454545 529 20.41587902 530 20.37735849 531 20.33898305 532 20.30075188 533 20.45028143 534 20.41198502 535 20.37383178 536 20.52238806 537 20.48417132 538 20.44609665 539 20.59369202 540 20.55555556 541 20.51756007 542 20.66420664 543 20.62615101 544 20.77205882 545 20.73394495 546 20.87912088 547 20.84095064 548 20.98540146 549 20.94717668 550 20.90909091 551 21.05263158 552 21.01449275 553 21.15732369 554 21.11913357 555 21.08108108 556 21.04316547 557 21.00538600 558 20.96774194 559 20.93023256 560 20.89285714 561 20.85561497 562 20.81850534 563 20.78152753 564 20.92198582 565 20.88495575 566 20.84805654 567 20.81128748 568 20.77464789 569 20.73813708 570 20.70175439 571 20.66549912 572 20.62937063 573 20.76788831 574 20.73170732 575 20.69565217 576 20.65972222 577 20.79722704 578 20.76124567 579 20.72538860 580 20.68965517 581 20.65404475 582 20.61855670 583 20.58319039 584 20.54794521 585 20.51282051 586 20.47781570 587 20.44293015 588 20.40816327 589 20.37351443 590 20.33898305 591 20.47377327 592 20.43918919 593 20.40472175 594 20.37037037 595 20.33613445 596 20.30201342 597 20.26800670 598 20.40133779 599 20.53422371 600 20.66666667 601 20.63227953 602 20.59800664 603 20.56384743 604 20.52980132 605 20.49586777 606 20.46204620 607 20.42833608 608 20.39473684 609 20.36124795 610 20.49180328 611 20.45826514 612 20.58823529 613 20.55464927 614 20.52117264 615 20.48780488 616 20.45454545 617 20.42139384 618 20.38834951 619 20.51696284 620 20.48387097 621 20.45088567 622 20.41800643 623 20.38523274 624 20.35256410 625 20.32000000 626 20.28753994 627 20.25518341 628 20.22292994 629 20.19077901 630 20.15873016 631 20.12678288 632 20.09493671 633 20.06319115 634 20.18927445 635 20.15748031 636 20.12578616 637 20.09419152 638 20.06269592 639 20.03129890 640 20.00000000 641 19.96879875 642 19.93769470 643 19.90668740 644 19.87577640 645 19.84496124 646 19.81424149 647 19.78361669 648 19.75308642 649 19.87673344 650 19.84615385 651 19.81566820 652 19.78527607 653 19.75497703 654 19.72477064 655 19.69465649 656 19.66463415 657 19.63470320 658 19.60486322 659 19.57511381 660 19.54545455 661 19.51588502 662 19.48640483 663 19.45701357 664 19.42771084 665 19.39849624 666 19.36936937 667 19.34032984 668 19.31137725 669 19.28251121 670 19.25373134 671 19.37406855 672 19.34523810 673 19.31649331 674 19.43620178 675 19.40740741 676 19.52662722 677 19.49778434 678 19.61651917 679 19.58762887 680 19.55882353 681 19.67694567 682 19.64809384 683 19.61932650 684 19.59064327 685 19.70802920 686 19.67930029 687 19.65065502 688 19.62209302 689 19.59361393 690 19.56521739 691 19.53690304 692 19.50867052 693 19.48051948 694 19.45244957 695 19.42446043 696 19.39655172 697 19.36872310 698 19.34097421 699 19.31330472 700 19.42857143 701 19.54350927 702 19.51566952 703 19.48790896 704 19.46022727 705 19.43262411 706 19.54674221 707 19.51909477 708 19.49152542 709 19.46403385 710 19.43661972 711 19.54992968 712 19.66292135 713 19.63534362 714 19.60784314 715 19.58041958 716 19.55307263 717 19.52580195 718 19.49860724 719 19.47148818 720 19.44444444 721 19.41747573 722 19.52908587 723 19.50207469 724 19.61325967 725 19.58620690 726 19.69696970 727 19.66987620 728 19.78021978 729 19.75308642 730 19.86301370 731 19.83584131 732 19.80874317 733 19.78171896 734 19.75476839 735 19.72789116 736 19.70108696 737 19.67435550 738 19.78319783 739 19.75642760 740 19.86486486 741 19.83805668 742 19.81132075 743 19.91924630 744 20.02688172 745 20.00000000 746 19.97319035 747 20.08032129 748 20.05347594 749 20.16021362 750 20.13333333 751 20.10652463 752 20.07978723 753 20.05312085 754 20.15915119 755 20.26490066 756 20.23809524 757 20.21136063 758 20.18469657 759 20.15810277 760 20.26315789 761 20.36793693 762 20.34120735 763 20.31454784 764 20.28795812 765 20.26143791 766 20.36553525 767 20.46936115 768 20.44270833 769 20.41612484 770 20.38961039 771 20.36316472 772 20.33678756 773 20.31047865 774 20.28423773 775 20.25806452 776 20.23195876 777 20.20592021 778 20.17994859 779 20.15404365 780 20.25641026 781 20.23047375 782 20.33248082 783 20.30651341 784 20.28061224 785 20.25477707 786 20.22900763 787 20.20330368 788 20.17766497 789 20.27883397 Final result: 20.2788 +/- 1.4323 Random chance: 10.1310 +/- 1.0749 |