yonatanbitton commited on
Commit
2604091
1 Parent(s): c6c05f8

Upload visit_bench_leaderboard.tsv

Browse files
Files changed (1) hide show
  1. visit_bench_leaderboard.tsv +16 -12
visit_bench_leaderboard.tsv CHANGED
@@ -1,12 +1,16 @@
1
- Model RFE Battles Win vs. Verified GPT-4
2
- Human Verified GPT-4 Reference 1363 3274 -
3
- LLaVA (13B) 1099 3274 5.03%
4
- mPLUG-Owl (7B) 1053 3284 4.55%
5
- LlamaAdapter-v2 (7B) 1037 3281 3.8%
6
- Otter (9B) 998 154 2.50%
7
- InstructBLIP (13B) 992 3274 2.37%
8
- VisualGPT (Da Vinci 003) 967 251 1.92%
9
- MiniGPT-4 (7B) 925 3291 2.09%
10
- OpenFlamingo (9B) 892 441 0.0%
11
- Multimodal GPT 854 267 0.0%
12
- PandaGPT (13B) 820 3275 0.85%
 
 
 
 
 
1
+ Category Model Elo matches Win vs. Reference (w/ # ratings)
2
+ Single Image Human Verified GPT-4 Reference 1370 5442 -
3
+ Single Image LLaVA (13B) 1106 5446 17.81% (n=494)
4
+ Single Image LlamaAdapter-v2 (7B) 1082 5445 13.75% (n=502)
5
+ Single Image mPLUG-Owl (7B) 1081 5452 15.29% (n=497)
6
+ Single Image InstructBLIP (13B) 1011 5444 13.73% (n=517)
7
+ Single Image Otter (9B) 991 5450 6.84% (n=512)
8
+ Single Image VisualGPT (Da Vinci 003) 972 5445 1.52% (n=527)
9
+ Single Image MiniGPT-4 (7B) 921 5442 3.26% (n=522)
10
+ Single Image OpenFlamingo (9B) 877 5449 2.86% (n=524)
11
+ Single Image PandaGPT (13B) 826 5441 2.63% (n=533)
12
+ Single Image Multimodal GPT 763 5450 0.18% (n=544)
13
+ Multiple Images Human Verified GPT-4 Reference 1192 180 -
14
+ Multiple Images mPLUG-Owl 995 180 6.67% (n=60)
15
+ Multiple Images Otter 911 180 1.69% (n=59)
16
+ Multiple Images OpenFlamingo 902 180 1.67% (n=60)