OpsEval / data /zjyd_zh_qa.csv
Junetheriver's picture
added qa leaderboards
22cd459
raw
history blame
1.58 kB
name,Faithfulness,Answer_Relevancy,Answer_Correctness,Answer_Similarity
Qwen1.5-7B-Chat,0.9001343784903166,0.8310295209988445,0.9078620109210616,0.8729269504884732
Qwen1.5-4B-Chat,0.8871057360749349,0.8993513260808733,0.8307124664634027,0.820558540099676
Gpt-3.5-Turbo,0.844,0.9381622494626971,0.7014364361923093,0.9646698567096922
Qwen1.5-14B-Chat,0.9143835616438356,0.9096799403053909,0.6924722613000067,0.9576526234420509
Vicuna-13B-V1.5,0.8201754385964912,0.8811313951272425,0.68730238555884,0.9481117257306625
Baichuan2-13B-Chat,0.902,0.9045473946630944,0.6857149093288882,0.9578530098977669
Baichuan2-7B-Chat,0.8784722222222222,0.896849978755001,0.6751955501292016,0.9500810985536641
Qwen1.5-1.8B-Chat,0.9148888888888888,0.8586071776868396,0.6748854449858851,0.947046701753897
Yi-6B-Chat,0.9511929511929511,0.7986143744572479,0.6694793902546,0.9285801614165997
Qwen1.5-0.5B-Chat,0.8277777777777777,0.8901546106376419,0.6588250813657541,0.9469939028778743
Vicuna-7B-V1.5,0.7171052631578947,0.8301247992194959,0.6521358982668551,0.9382112592746454
Internlm2-Chat-20B,0.8146430093452255,0.6294665932615476,0.5592223065723815,0.9031372380769384
Internlm2-Chat-7B,0.7936354405828091,0.6059388264548148,0.5497547973508542,0.9071347182079667
Gemma-7B,0.5690370087428911,0.294443307376398,0.5182431858082619,0.8437500469275063
Yi-6B,0.4679211960033877,0.29049526322106994,0.4910372529026469,0.8409424204038982
Mistral-7B,0.7985507246376812,0.40909012863946165,0.4698180894318443,0.85180274226269
Gemma-2B,0.5461295296041059,0.32955654240675497,0.4138425436194475,0.8085919354670669