Category	Model	Elo	# Matches	Win vs. Reference (w/ # ratings)
Single Image	human_verified_reference	1402	5597	---
Single Image	llava13b_output	1128	5399	18.35% (n=474)
Single Image	mPLUG-Owl prediction	1117	5390	15.87% (n=479)
Single Image	LlamaAdapter-v2 prediction	1084	5416	14.17% (n=487)
Single Image	Lynx(8B) predictions	1046	758	11.76% (n=136)
Single Image	instruct_blip_output	1021	5396	14.14% (n=502)
Single Image	otter	962	5397	7.03% (n=498)
Single Image	visual_gpt_davinci003_output	953	5414	1.57% (n=509)
Single Image	Octopus V2 prediction	952	994	5.29% (n=170)
Single Image	MiniGPT-4 prediction	938	5393	3.37% (n=505)
Single Image	openflamingo	851	5397	2.95% (n=508)
Single Image	panda_gpt_13b_output	801	5397	2.70% (n=518)
Single Image	mmgpt_output	747	5402	0.19% (n=526)