task,metric,value,err,version anli_r1,acc,0.328,0.014853842487270334,0 anli_r2,acc,0.355,0.015139491543780536,0 anli_r3,acc,0.35083333333333333,0.013782212417178199,0 arc_challenge,acc,0.2977815699658703,0.013363080107244489,0 arc_challenge,acc_norm,0.3046075085324232,0.013449522109932487,0 arc_easy,acc,0.6355218855218855,0.00987572928248244,0 arc_easy,acc_norm,0.6111111111111112,0.01000324833531377,0 boolq,acc,0.6159021406727829,0.008506861063860244,1 cb,acc,0.5,0.06741998624632421,1 cb,f1,0.26794871794871794,,1 copa,acc,0.83,0.037752516806863715,0 hellaswag,acc,0.4726150169288986,0.004982291744069915,0 hellaswag,acc_norm,0.633240390360486,0.004809352075008949,0 piqa,acc,0.750272034820457,0.010099232969867486,0 piqa,acc_norm,0.7671381936887922,0.009861236071080751,0 rte,acc,0.4729241877256318,0.030052303463143706,0 sciq,acc,0.915,0.008823426366942324,0 sciq,acc_norm,0.917,0.008728527206074787,0 storycloze_2016,acc,0.7279529663281668,0.010290888060871242,0 winogrande,acc,0.5911602209944752,0.013816954295135696,0