dataset,prompt,metric,value xnli_ar,GPT-3 style_armt,accuracy,0.3333333333333333 xnli_ar,MNLI crowdsource_armt,accuracy,0.4751004016064257 xnli_ar,can we infer_armt,accuracy,0.3333333333333333 xnli_ar,guaranteed/possible/impossible_armt,accuracy,0.45823293172690766 xnli_ar,justified in saying_armt,accuracy,0.3337349397590361 xnli_ar,median,accuracy,0.3337349397590361 xnli_es,GPT-3 style_esmt,accuracy,0.5895582329317269 xnli_es,MNLI crowdsource_esmt,accuracy,0.5100401606425703 xnli_es,can we infer_esmt,accuracy,0.3333333333333333 xnli_es,guaranteed/possible/impossible_esmt,accuracy,0.3389558232931727 xnli_es,justified in saying_esmt,accuracy,0.3333333333333333 xnli_es,median,accuracy,0.3389558232931727 xnli_fr,GPT-3 style_frmt,accuracy,0.4967871485943775 xnli_fr,MNLI crowdsource_frmt,accuracy,0.3333333333333333 xnli_fr,can we infer_frmt,accuracy,0.5586345381526104 xnli_fr,guaranteed/possible/impossible_frmt,accuracy,0.44096385542168676 xnli_fr,justified in saying_frmt,accuracy,0.4899598393574297 xnli_fr,median,accuracy,0.4899598393574297 xnli_hi,GPT-3 style_himt,accuracy,0.4393574297188755 xnli_hi,MNLI crowdsource_himt,accuracy,0.3333333333333333 xnli_hi,can we infer_himt,accuracy,0.3610441767068273 xnli_hi,guaranteed/possible/impossible_himt,accuracy,0.38072289156626504 xnli_hi,justified in saying_himt,accuracy,0.39759036144578314 xnli_hi,median,accuracy,0.38072289156626504 xnli_sw,GPT-3 style_swmt,accuracy,0.3333333333333333 xnli_sw,MNLI crowdsource_swmt,accuracy,0.3337349397590361 xnli_sw,can we infer_swmt,accuracy,0.334136546184739 xnli_sw,guaranteed/possible/impossible_swmt,accuracy,0.3321285140562249 xnli_sw,justified in saying_swmt,accuracy,0.3357429718875502 xnli_sw,median,accuracy,0.3337349397590361 xnli_ur,GPT-3 style_urmt,accuracy,0.3718875502008032 xnli_ur,MNLI crowdsource_urmt,accuracy,0.3421686746987952 xnli_ur,can we infer_urmt,accuracy,0.36666666666666664 xnli_ur,guaranteed/possible/impossible_urmt,accuracy,0.3333333333333333 xnli_ur,justified in saying_urmt,accuracy,0.378714859437751 xnli_ur,median,accuracy,0.36666666666666664 xnli_vi,GPT-3 style_vimt,accuracy,0.3333333333333333 xnli_vi,MNLI crowdsource_vimt,accuracy,0.3333333333333333 xnli_vi,can we infer_vimt,accuracy,0.3333333333333333 xnli_vi,guaranteed/possible/impossible_vimt,accuracy,0.3389558232931727 xnli_vi,justified in saying_vimt,accuracy,0.3333333333333333 xnli_vi,median,accuracy,0.3333333333333333 xnli_zh,GPT-3 style_zhmt,accuracy,0.3895582329317269 xnli_zh,MNLI crowdsource_zhmt,accuracy,0.3333333333333333 xnli_zh,can we infer_zhmt,accuracy,0.40602409638554215 xnli_zh,guaranteed/possible/impossible_zhmt,accuracy,0.44136546184738956 xnli_zh,justified in saying_zhmt,accuracy,0.3405622489959839 xnli_zh,median,accuracy,0.3895582329317269 multiple,average,multiple,0.37083333333333335