Spaces:
Sleeping
Sleeping
Update file/result.csv
Browse files- file/result.csv +19 -19
file/result.csv
CHANGED
@@ -1,20 +1,20 @@
|
|
1 |
Model Type,Model,Language Model,Scene Understanding,Instance Identity,Instance Attributes,Instance Localization,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Avg. Img,Action Recognition,Action Prediction,Procedure Understanding,Avg. Video,Avg. All
|
2 |
-
LLM,Flan-T5,Flan-T5-XL,23
|
3 |
-
LLM,Vicuna,Vicuna-7B,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,28.16,27.3,34.5,23.8,29.47,28.5
|
4 |
-
LLM,LLaMA,LLaMA-7B,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37
|
5 |
-
ImageLLM,BLIP-2,Flan-T5-XL,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,49.74,32.6,47.5,24
|
6 |
-
ImageLLM,InstructBLIP,Flan-T5-XL,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,57.8,33.1,49.1,27.1,38.31,52.73
|
7 |
-
ImageLLM,InstructBLIP-Vicuna,Vicuna-7B,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,58.76,34.5,49.6,23.1,38.05,53.37
|
8 |
-
ImageLLM,LLaVA,LLaMA-7B,42.7,34.9,33.5,28.4,41.9,30.8,27.8,46.8,27.7,36.96,29.7,21.4,19.1,23.76,33.52
|
9 |
-
ImageLLM,MiniGPT-4,Flan-T5-XL,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,47.4,38.2,24.5,27.1,29.89,42.84
|
10 |
-
ImageLLM,VPGTrans,LLaMA-7B,51.9,44.1,39.9,36.1,33.7,36.4,32
|
11 |
-
ImageLLM,MultiModal-GPT,LLaMA-7B,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,34.54,36.9,25.8,24
|
12 |
-
ImageLLM,Otter,LLaMA-7B,44.9,38.6,32.2,30.9,26.3,31.8,32
|
13 |
-
ImageLLM,OpenFlamingo,LLaMA-7B,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20
|
14 |
-
ImageLLM,LLaMA-AdapterV2,LLaMA-7B,45.2,38.5,29.3,33
|
15 |
-
ImageLLM,GVT,Vicuna-7B,41.7,35.5,31.8,29.5,36.2,32
|
16 |
-
ImageLLM,mPLUG-Owl,LLaMA-7B,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,37.88,26.7,17.9,26.5,23.02,34.01
|
17 |
-
ImageLLM,Kosmos-2,Decoder Only 1.3B,63.
|
18 |
-
VideoLLM,VideoChat,Vicuna-7B,47.1,43.8,34.9,40
|
19 |
-
VideoLLM,Video-ChatGPT,LLaMA-7B,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,33.88,27.6,21.3,21.1,23.46,31.17
|
20 |
-
VideoLLM,Valley,LLaMA-13B,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,32.04,31.3,23.2,20.7,25.41,30.32
|
|
|
1 |
Model Type,Model,Language Model,Scene Understanding,Instance Identity,Instance Attributes,Instance Localization,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Avg. Img,Action Recognition,Action Prediction,Procedure Understanding,Avg. Video,Avg. All
|
2 |
+
LLM,[Flan-T5](https://huggingface.co/google/flan-t5-xl),Flan-T5-XL,23,29,32.8,31.8,20.5,31.8,33,18.2,19.4,27.32,23.2,34.9,25.4,28.57,27.65
|
3 |
+
LLM,[Vicuna](https://huggingface.co/lmsys/vicuna-7b-v1.3),Vicuna-7B,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,28.16,27.3,34.5,23.8,29.47,28.5
|
4 |
+
LLM,[LLaMA](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/),LLaMA-7B,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37,9,26.56,33,23.1,26.2,27.27,26.75
|
5 |
+
ImageLLM,[BLIP-2](https://github.com/salesforce/LAVIS),Flan-T5-XL,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,49.74,32.6,47.5,24,36.71,46.35
|
6 |
+
ImageLLM,[InstructBLIP](https://github.com/salesforce/LAVIS),Flan-T5-XL,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,57.8,33.1,49.1,27.1,38.31,52.73
|
7 |
+
ImageLLM,[InstructBLIP-Vicuna](https://github.com/salesforce/LAVIS),Vicuna-7B,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,58.76,34.5,49.6,23.1,38.05,53.37
|
8 |
+
ImageLLM,[LLaVA](https://github.com/haotian-liu/LLaVA),LLaMA-7B,42.7,34.9,33.5,28.4,41.9,30.8,27.8,46.8,27.7,36.96,29.7,21.4,19.1,23.76,33.52
|
9 |
+
ImageLLM,[MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4),Flan-T5-XL,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,47.4,38.2,24.5,27.1,29.89,42.84
|
10 |
+
ImageLLM,[VPGTrans](https://github.com/VPGTrans/VPGTrans),LLaMA-7B,51.9,44.1,39.9,36.1,33.7,36.4,32,53.2,30.6,41.81,39.5,24.3,31.9,31.4,39.1
|
11 |
+
ImageLLM,[MultiModal-GPT](https://github.com/open-mmlab/Multimodal-GPT),LLaMA-7B,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,34.54,36.9,25.8,24,29.21,33.15
|
12 |
+
ImageLLM,[Otter](https://github.com/Luodian/Otter),LLaMA-7B,44.9,38.6,32.2,30.9,26.3,31.8,32,51.4,31.8,35.16,37.9,27.2,24.8,30.35,33.91
|
13 |
+
ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),LLaMA-7B,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20,34.51,37.2,25.4,24.2,29.25,33.14
|
14 |
+
ImageLLM,[LLaMA-AdapterV2](https://github.com/OpenGVLab/LLaMA-Adapter),LLaMA-7B,45.2,38.5,29.3,33,29.7,35.5,39.2,52,24.7,35.19,38.6,18.5,19.6,25.75,32.73
|
15 |
+
ImageLLM,[GVT](https://github.com/TencentARC/GVT),Vicuna-7B,41.7,35.5,31.8,29.5,36.2,32,32,51.1,27.1,35.49,33.9,25.4,23,27.77,33.48
|
16 |
+
ImageLLM,[mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,37.88,26.7,17.9,26.5,23.02,34.01
|
17 |
+
ImageLLM,[Kosmos-2](https://github.com/microsoft/unilm/tree/master/kosmos-2),Decoder Only 1.3B,63.36,57.07,58.53,43.97,41.40,37.90,55.67,60.73,25.89,54.35,41.32,40.40,27.01,37.53,49.97
|
18 |
+
VideoLLM,[VideoChat](https://github.com/OpenGVLab/Ask-Anything),Vicuna-7B,47.1,43.8,34.9,40,32.8,34.6,42.3,50.5,17.7,39.02,34.9,36.4,27.3,33.68,37.63
|
19 |
+
VideoLLM,[Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT),LLaMA-7B,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,33.88,27.6,21.3,21.1,23.46,31.17
|
20 |
+
VideoLLM,[Valley](https://github.com/RupertLuo/Valley),LLaMA-13B,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,32.04,31.3,23.2,20.7,25.41,30.32
|