Q-Bench-Leaderboard / qbench_a1_pair_dev.csv
zhangzicheng's picture
Upload 2 files
83739d2 verified
raw
history blame
933 Bytes
Model (variant),Yes-or-No,What,How,Distortion,Other,Compare,Joint,Overall
InfiMM (Zephyr-7B),48.11,39.04,40.06,42.56,43.78,41.77,48.33,42.95
Emu2-Chat (LLaMA-33B),56.64,41.15,49.62,49.12,51.91,47.86,60,50.05
Fuyu-8B (Persimmon-8B),68.76,33.56,38.78,46.83,54.03,47.86,55,49.15
BakLLava (Mistral-7B),56.92,43.83,50,49.33,54.34,50.66,52.22,50.94
mPLUG-Owl2 (Q-Instruct),59.19,42.12,47.43,49.63,52.48,49.81,53.88,50.54
mPLUG-Owl2 (LLaMA-7B),58.43,39.72,48.39,49.04,51.55,47.5,60.55,49.85
LLaVA-v1.5 (Vicuna-v1.5-7B),60.46,42.85,41.53,47.88,51.89,46.55,59.57,49.32
LLaVA-v1.5 (Vicuna-v1.5-13B),56.42,42.46,48.38,48.15,53.41,48.84,54.44,49.85
Qwen-VL-Plus (Close-Source),63.63,55.55,55.71,61.61,56.52,65.81,58.45,60.7
Qwen-VL-Max (Close-Source),71.96,62.87,65.53,69.21,62.69,67.54,66.01,67.27
Gemini-Pro (Close-Source),64.98,51.36,54.16,58.17,56.52,57.73,57.22,57.64
GPT-4V (Close-Source),79.34,70.54,78.52,75.84,77.95,78.8,66.11,76.52