task,metric,value,err,version arc_challenge,acc,0.2175767918088737,0.012057262020972504,0 arc_challenge,acc_norm,0.2551194539249147,0.012739038695202104,0 arc_easy,acc,0.49242424242424243,0.010258605792153321,0 arc_easy,acc_norm,0.43897306397306396,0.010183076012972067,0 boolq,acc,0.5235474006116208,0.008735351675636605,1 copa,acc,0.72,0.04512608598542127,0 hellaswag,acc,0.3556064528978291,0.0047771835089498215,0 hellaswag,acc_norm,0.4304919338777136,0.004941331215598556,0 piqa,acc,0.70620239390642,0.010627574080514797,0 piqa,acc_norm,0.7013057671381937,0.010678556398149226,0 rte,acc,0.5379061371841155,0.030009848912529117,0 sciq,acc,0.754,0.013626065817750638,0 sciq,acc_norm,0.666,0.01492201952373297,0 winogrande,acc,0.5288082083662194,0.014029141615909615,0