Category Model Elo matches Win vs. Reference (w/ # ratings) | |
Single Image Human Verified GPT-4 Reference 1370 5442 - | |
Single Image LLaVA (13B) 1106 5446 17.81% (n=494) | |
Single Image LlamaAdapter-v2 (7B) 1082 5445 13.75% (n=502) | |
Single Image mPLUG-Owl (7B) 1081 5452 15.29% (n=497) | |
Single Image InstructBLIP (13B) 1011 5444 13.73% (n=517) | |
Single Image Otter (9B) 991 5450 6.84% (n=512) | |
Single Image VisualGPT (Da Vinci 003) 972 5445 1.52% (n=527) | |
Single Image MiniGPT-4 (7B) 921 5442 3.26% (n=522) | |
Single Image OpenFlamingo (9B) 877 5449 2.86% (n=524) | |
Single Image PandaGPT (13B) 826 5441 2.63% (n=533) | |
Single Image Multimodal GPT 763 5450 0.18% (n=544) | |
Multiple Images Human Verified GPT-4 Reference 1192 180 - | |
Multiple Images mPLUG-Owl 995 180 6.67% (n=60) | |
Multiple Images Otter 911 180 1.69% (n=59) | |
Multiple Images OpenFlamingo 902 180 1.67% (n=60) |