dataset,prompt,metric,value xcopa_zh,C1 or C2? premise_zhht,accuracy,0.55 xcopa_zh,best_option_zhht,accuracy,0.67 xcopa_zh,cause_effect_zhht,accuracy,0.79 xcopa_zh,i_am_hesitating_zhht,accuracy,0.77 xcopa_zh,plausible_alternatives_zhht,accuracy,0.75 xcopa_zh,median,accuracy,0.75 xstory_cloze_zh,Answer Given options_zhht,accuracy,0.7054930509596293 xstory_cloze_zh,Choose Story Ending_zhht,accuracy,0.7948378557246857 xstory_cloze_zh,Generate Ending_zhht,accuracy,0.6366644606221046 xstory_cloze_zh,Novel Correct Ending_zhht,accuracy,0.7782925215089345 xstory_cloze_zh,Story Continuation and Options_zhht,accuracy,0.771012574454004 xstory_cloze_zh,median,accuracy,0.771012574454004 xwinograd_zh,Replace_zhht,accuracy,0.5178571428571429 xwinograd_zh,True or False_zhht,accuracy,0.5218253968253969 xwinograd_zh,does underscore refer to_zhht,accuracy,0.4662698412698413 xwinograd_zh,stand for_zhht,accuracy,0.49404761904761907 xwinograd_zh,underscore refer to_zhht,accuracy,0.44047619047619047 xwinograd_zh,median,accuracy,0.49404761904761907 multiple,average,multiple,0.6716867311672077