Category Model Elo # Matches Win vs. Reference (w/ # ratings) Single Image human_verified_reference 1361 6030 --- Single Image LLaVA-Plus 1206 724 30.15% (n=136) Single Image LLaVA 13B 1091 5474 18.53% (n=475) Single Image Lynx 7B V2 1078 708 15.15% (n=132) Single Image mPLUG-Owl 1076 5465 16.04% (n=480) Single Image LlamaAdapter-v2 1055 5485 14.14% (n=488) Single Image idefics9b 1030 842 9.72% (n=144) Single Image Lynx(8B) 1012 827 11.43% (n=140) Single Image InstructBLIP 995 5505 14.12% (n=503) Single Image otter 970 5495 7.01% (n=499) Single Image visual_gpt_davinci003 937 5486 1.57% (n=510) Single Image Octopus V2 936 820 8.90% (n=146) Single Image MiniGPT-4 899 5473 3.36% (n=506) Single Image openflamingo 831 5490 2.95% (n=509) Single Image panda_gpt_13b 767 5480 2.70% (n=519) Single Image MMGPT 757 5504 0.19% (n=527)