|
Category Model Elo # Matches Win vs. Reference (w/ # ratings) |
|
Single Image gpt4v_response 1349 677 65.44% (n=136) |
|
Single Image human_verified_reference 1338 6480 --- |
|
Single Image llava-a1-predictions 1187 812 30.15% (n=136) |
|
Single Image llava13b_output 1091 5574 18.53% (n=475) |
|
Single Image LlamaAdapter-v2 prediction 1066 5573 14.14% (n=488) |
|
Single Image lynx(7B)_v2 prediction 1051 796 15.15% (n=132) |
|
Single Image mPLUG-Owl prediction 1025 5561 15.83% (n=480) |
|
Single Image lynx_test_2 1002 731 11.02% (n=127) |
|
Single Image idefics9b_prediction 997 940 9.72% (n=144) |
|
Single Image Lynx(8B) predictions 990 929 11.43% (n=140) |
|
Single Image instruct_blip_output 964 5612 14.12% (n=503) |
|
Single Image otter 947 5597 7.01% (n=499) |
|
Single Image Octopus V2 prediction 920 913 8.90% (n=146) |
|
Single Image lynx_test_1 913 738 2.82% (n=142) |
|
Single Image visual_gpt_davinci003_output 911 5585 1.57% (n=510) |
|
Single Image MiniGPT-4 prediction 900 5560 3.36% (n=506) |
|
Single Image openflamingo 845 5591 2.95% (n=509) |
|
Single Image panda_gpt_13b_output 786 5573 2.70% (n=519) |
|
Single Image mmgpt_output 718 5604 0.19% (n=527) |
|
|