Spaces:

q-future
/

Q-Bench-Leaderboard

Running

App Files Files Community

Q-Bench-Leaderboard / qbench_a1_pair_dev.csv

zhangzicheng's picture

Upload 2 files

83739d2 verified 10 months ago

933 Bytes

	Model (variant),Yes-or-No,What,How,Distortion,Other,Compare,Joint,Overall
	InfiMM (Zephyr-7B),48.11,39.04,40.06,42.56,43.78,41.77,48.33,42.95
	Emu2-Chat (LLaMA-33B),56.64,41.15,49.62,49.12,51.91,47.86,60,50.05
	Fuyu-8B (Persimmon-8B),68.76,33.56,38.78,46.83,54.03,47.86,55,49.15
	BakLLava (Mistral-7B),56.92,43.83,50,49.33,54.34,50.66,52.22,50.94
	mPLUG-Owl2 (Q-Instruct),59.19,42.12,47.43,49.63,52.48,49.81,53.88,50.54
	mPLUG-Owl2 (LLaMA-7B),58.43,39.72,48.39,49.04,51.55,47.5,60.55,49.85
	LLaVA-v1.5 (Vicuna-v1.5-7B),60.46,42.85,41.53,47.88,51.89,46.55,59.57,49.32
	LLaVA-v1.5 (Vicuna-v1.5-13B),56.42,42.46,48.38,48.15,53.41,48.84,54.44,49.85
	Qwen-VL-Plus (Close-Source),63.63,55.55,55.71,61.61,56.52,65.81,58.45,60.7
	Qwen-VL-Max (Close-Source),71.96,62.87,65.53,69.21,62.69,67.54,66.01,67.27
	Gemini-Pro (Close-Source),64.98,51.36,54.16,58.17,56.52,57.73,57.22,57.64
	GPT-4V (Close-Source),79.34,70.54,78.52,75.84,77.95,78.8,66.11,76.52