Category	Model	Elo	# Matches	Win vs. Reference (w/ # ratings)
Single Image	gpt4v_response	1349	677	65.44% (n=136)
Single Image	human_verified_reference	1338	6480	---
Single Image	llava-a1-predictions	1187	812	30.15% (n=136)
Single Image	llava13b_output	1091	5574	18.53% (n=475)
Single Image	LlamaAdapter-v2 prediction	1066	5573	14.14% (n=488)
Single Image	lynx(7B)_v2 prediction	1051	796	15.15% (n=132)
Single Image	mPLUG-Owl prediction	1025	5561	15.83% (n=480)
Single Image	lynx_test_2	1002	731	11.02% (n=127)
Single Image	idefics9b_prediction	997	940	9.72% (n=144)
Single Image	Lynx(8B) predictions	990	929	11.43% (n=140)
Single Image	instruct_blip_output	964	5612	14.12% (n=503)
Single Image	otter	947	5597	7.01% (n=499)
Single Image	Octopus V2 prediction	920	913	8.90% (n=146)
Single Image	lynx_test_1	913	738	2.82% (n=142)
Single Image	visual_gpt_davinci003_output	911	5585	1.57% (n=510)
Single Image	MiniGPT-4 prediction	900	5560	3.36% (n=506)
Single Image	openflamingo	845	5591	2.95% (n=509)
Single Image	panda_gpt_13b_output	786	5573	2.70% (n=519)
Single Image	mmgpt_output	718	5604	0.19% (n=527)