OpsEval / data /owl_en_qa.csv
Junetheriver's picture
added qa leaderboards
22cd459
raw
history blame
No virus
1.61 kB
name,Faithfulness,Answer_Relevancy,Answer_Correctness,Answer_Similarity
Yi-6B-Chat,0.7030488252628491,0.8539968939451946,0.8345303291521267,0.8525487106316666
Qwen1.5-7B-Chat,0.7730288486126379,0.800263654571703,0.7701434694510604,0.8096226126029853
Vicuna-13B-V1.5,0.7252900133287701,0.8001237927827157,0.7513908712676198,0.8129508434301099
Gpt-3.5-Turbo,0.7601832993890021,0.9508419461436827,0.5945237008802382,0.8766794953769677
Qwen1.5-14B-Chat,0.8108575380359613,0.9186608285564491,0.5796170633787928,0.8627921465440817
Internlm2-Chat-20B,0.8823450852329868,0.8990122482812408,0.5663255561571012,0.817104818105292
Internlm2-Chat-7B,0.8716367920317769,0.9049173556355747,0.5566486218868514,0.8194293421446569
Vicuna-7B-V1.5,0.6687478686911705,0.8847336678547908,0.5491987169778965,0.8538950235036584
Qwen1.5-4B-Chat,0.7161414565826331,0.916949622281115,0.5415164042157119,0.8588077047327288
Qwen1.5-1.8B-Chat,0.7559747023809523,0.9469277644039529,0.5355121893517637,0.8511550798429494
Baichuan2-13B-Chat,0.724778459441036,0.9033782254193811,0.5324917996259314,0.8430175816264579
Baichuan2-7B-Chat,0.663319530710835,0.8543448236955469,0.5222686618152338,0.8364213008907668
Gemma-7B,0.5647578582126265,0.6814204309035338,0.5202336438594105,0.7806024397207423
Qwen1.5-0.5B-Chat,0.5679874805086168,0.8611226406276706,0.513748281764636,0.812332476681601
Mistral-7B,0.6586367313915859,0.7039079054469578,0.5078017923324171,0.7902698697096028
Gemma-2B,0.5049161881111284,0.6528267517862424,0.5059908632023802,0.7736166726699579
Yi-6B,0.5063160585604476,0.6749962990823568,0.49929516708962135,0.7789524853407436