BreakLee commited on
Commit
22f1cbb
1 Parent(s): 98703e8

Upload 2 files

Browse files
Files changed (2) hide show
  1. file/result.csv +39 -38
  2. file/result_task.csv +37 -36
file/result.csv CHANGED
@@ -1,50 +1,51 @@
1
  Model Type,Model,Language Model,Model Size,Evaluation Method,Avg. All,Avg. Img,Avg. Video,Scene Understanding,Instance Identity,Instance Attribute,Instance Location,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Action Recognition,Action Prediction,Procedure Understanding
2
- LLM,[Flan-T5](https://huggingface.co/google/flan-t5-xl),Flan-T5-XL,3B,PPL,27.7,27.3,28.6,23.0,29.0,32.8,31.8,20.5,31.8,33.0,18.2,19.4,23.2,34.9,25.4
3
  LLM,[Vicuna](https://huggingface.co/lmsys/vicuna-7b-v1.3),Vicuna-7B,7B,PPL,28.5,28.2,29.5,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,27.3,34.5,23.8
4
- LLM,[LLaMA](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/),LLaMA-7B,7B,PPL,26.8,26.6,27.3,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37.0,9.0,33.0,23.1,26.2
5
- ImageLLM,[BLIP-2](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,46.4,49.7,36.7,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,32.6,47.5,24.0
6
  ImageLLM,[InstructBLIP](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,52.7,57.8,38.3,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,33.1,49.1,27.1
7
  ImageLLM,[InstructBLIP-Vicuna](https://github.com/salesforce/LAVIS),Vicuna-7B,7B,PPL,53.4,58.8,38.1,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,34.5,49.6,23.1
8
- ImageLLM,[LLaVA-1.5](https://github.com/haotian-liu/LLaVA),Vicuna-13B,13B,Generate,61.6,68.2,42.7,74.9,71.3,68.9,63.5,61.3,51.4,73.2,77.0,60.5,48.9,41.1,36.6
9
- ImageLLM,[LLaVA-v1.5-13B-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL,62.4,68.2,40.5,74.9,70.9,70.1,62.5,60.6,52.4,74.2,77.3,26.7,47.5,36.0,35.7
10
- ImageLLM,[LLaVA-v1.5-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL for A/B/C/D,62.8,68.9,39.5,75.2,71.4,72.0,62.7,59.8,51.1,71.1,80.7,39.5,46.1,37.1,32.6
11
  ImageLLM,[MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4),Vicuna-7B,7B,PPL,42.8,47.4,29.9,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,38.2,24.5,27.1
12
- ImageLLM,[VPGTrans](https://github.com/VPGTrans/VPGTrans),LLaMA-7B,7B,PPL,39.1,41.8,31.4,51.9,44.1,39.9,36.1,33.7,36.4,32.0,53.2,30.6,39.5,24.3,31.9
13
- ImageLLM,[MultiModal-GPT](https://github.com/open-mmlab/Multimodal-GPT),LLaMA-7B,7B,PPL,33.2,34.5,29.2,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,36.9,25.8,24.0
14
- ImageLLM,[Otter](https://github.com/Luodian/Otter),LLaMA-7B,7B,PPL,33.9,35.2,30.4,44.9,38.6,32.2,30.9,26.3,31.8,32.0,51.4,31.8,37.9,27.2,24.8
15
  ImageLLM,[Otter](https://github.com/Luodian/Otter),MPT-7B,7B,PPL,39.7,42.9,30.6,51.3,43.5,42.3,34.2,38.4,30.9,40.2,55.3,24.7,36.8,29.2,23.8
16
- ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),LLaMA-7B,7B,PPL,33.1,34.5,29.3,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20.0,37.2,25.4,24.2
17
- ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),MPT-7B,7B,PPL,40.9,42.7,35.7,53.2,45.3,40.0,31.2,39.3,32.6,36.1,51.4,25.9,42.9,34.7,26.9
18
- ImageLLM,[LLaMA-AdapterV2](https://github.com/OpenGVLab/LLaMA-Adapter),LLaMA-7B,7B,PPL,32.7,35.2,25.8,45.2,38.5,29.3,33.0,29.7,35.5,39.2,52.0,24.7,38.6,18.5,19.6
19
- ImageLLM,[GVT](https://github.com/TencentARC/GVT),Vicuna-7B,7B,PPL,33.5,35.5,27.8,41.7,35.5,31.8,29.5,36.2,32.0,32.0,51.1,27.1,33.9,25.4,23.0
20
- ImageLLM,[mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,PPL,34.0,37.9,23.0,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,26.7,17.9,26.5
21
- ImageLLM,[Kosmos-2](https://github.com/microsoft/unilm/tree/master/kosmos-2),Decoder Only 1.3B,1.3B,PPL,50.0,54.4,37.5,63.4,57.1,58.5,44.0,41.4,37.9,55.7,60.7,25.9,41.3,40.4,27.0
22
  ImageLLM,[Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat),Qwen-7B,7B,PPL for A/B/C/D,58.2,65.4,37.8,73.3,67.3,69.6,57.7,52.9,48.2,59.8,74.6,53.5,43.9,39.2,26.7
23
- ImageLLM,[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL),Qwen-7B,7B,PPL for A/B/C/D,56.3,62.3,39.1,71.2,66.4,67.7,53.5,44.8,43.8,62.9,74.9,51.2,44.7,38.5,32.0
24
- ImageLLM,[Qwen-VL-plus](https://github.com/QwenLM/Qwen-VL/tree/master?tab=readme-ov-file#qwen-vl-plus),Qwen-LM,-,PPL for A/B/C/D,66.8,72.7,44.7,76.5,77.6,75.3,64.9,66.3,56.8,69.1,78.2,54.7,51.6,39.2,41.0
25
- ImageLLM,[IDEFICS-9b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-7B,7B,NG,0.0,44.5,0.0,55.8,45.3,42.3,40.2,36.8,34.9,37.1,55.9,38.8,0.0,0.0,0.0
26
- ImageLLM,[IDEFICS-80b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-65B,65B,NG,0.0,53.2,0.0,64.0,52.6,50.8,48.3,46.1,45.5,62.9,68.0,51.8,0.0,0.0,0.0
27
- ImageLLM,[InternLM-XComposer-VL](https://github.com/InternLM/InternLM-XComposer),InternLM-7B,7B,PPL,0.0,66.9,0.0,75.0,71.7,67.6,60.8,56.2,55.3,74.4,77.0,48.5,0.0,0.0,0.0
28
- ImageLLM,[InternLM-XComposer2-VL-7B](https://github.com/InternLM/InternLM-XComposer),InternLM2,7B,Generate,0.0,75.9,0.0,79.3,78.4,77.8,72.7,70.0,63.5,79.4,80.1,68.6,0.0,0.0,0.0
29
- ImageLLM,[InternVL-Chat-V1.2-Plus](https://github.com/OpenGVLab/InternVL),Nous-Hermes-2-Yi-34B,34B,Generate,70.4,76.4,47.7,80.2,80.0,77.8,71.3,72.3,63.3,77.3,79.8,50.0,49.4,41.8,52.2
30
- ImageLLM,[SEED-LLaMA](https://github.com/AILab-CVC/SEED),LLaMA2-Chat-13B,13B,PPL,48.9,53.7,35.4,64.1,54.2,54.1,46.5,45.3,38.2,51.6,60.7,44.7,37.8,45.3,20.0
31
- ImageLLM,[mPLUG-Owl2](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,NG,57.8,64.1,39.8,72.7,67.6,63.6,53.6,58.5,50.8,70.1,76.4,30.2,46.0,38.7,32.9
32
  ImageLLM,[LLaMA-VID-7B](https://github.com/dvlab-research/LLaMA-VID),LLaMA-7B,7B,Generate,59.9,67.6,37.9,75.4,71.2,68.9,62.9,58.4,50.7,70.1,76.1,54.7,42.8,35.2,35.6
33
- ImageLLM,[Pink-LLaMA2](https://github.com/SY-Xuan/Pink/stargazers),LLaMA2-7B,7B,NG,0.0,67.0,0.0,75.2,70.1,70.1,63.3,53.8,50.2,69.1,74.3,50.0,0.0,0.0,0.0
34
- ImageLLM,[InfMLLM-13B](https://github.com/mightyzau/InfMLLM),Vicuna-13B,13B,Generate,62.3,69.6,41.5,75.5,73.0,70.4,66.2,63.3,54.2,72.2,77.9,37.2,49.5,39.0,33.9
35
- ImageLLM,[ShareGPT4V-7B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-7B,7B,Generate,0.0,69.7,0.0,75.3,71.4,72.3,63.1,62.0,53.9,70.1,79.8,54.7,0.0,0.0,0.0
36
- ImageLLM,[ShareGPT4V-13B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-13B,13B,Generate,0.0,70.8,0.0,75.9,74.1,73.5,66.8,62.4,54.8,75.3,77.3,46.5,0.0,0.0,0.0
37
- ImageLLM,[Honeybee-13B](https://github.com/kakaobrain/honeybee),Vicuna-13B,13B,Generate,0.0,68.6,0.0,75.4,72.8,69.0,64.5,60.6,55.1,72.2,77.9,41.9,0.0,0.0,0.0
38
- ImageLLM,[SPHINXv1-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,63.9,71.6,34.7,75.4,72.2,75.1,64.2,68.2,49.3,66.0,78.6,62.4,41.2,33.9,26.1
39
  ImageLLM,[SPHINXv2-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,67.5,74.8,39.8,77.7,77.4,76.8,69.4,71.2,59.4,70.1,78.3,74.1,48.1,37.9,29.9
40
  ImageLLM,[GPT-4V](https://openai.com/research/gpt-4v-system-card),\-,-,Generate,67.3,69.1,60.5,77.5,73.9,70.6,61.8,56.8,56.9,74.2,78.5,57.6,65.7,51.7,63.4
41
- VideoLLM,[VideoChat](https://github.com/OpenGVLab/Ask-Anything),Vicuna-7B,7B,PPL,37.6,39.0,33.7,47.1,43.8,34.9,40.0,32.8,34.6,42.3,50.5,17.7,34.9,36.4,27.3
42
  VideoLLM,[Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT),LLaMA-7B,7B,PPL,31.2,33.9,23.5,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,27.6,21.3,21.1
43
- VideoLLM,[Valley](https://github.com/RupertLuo/Valley),LLaMA-13B,13B,PPL,30.3,32.0,25.4,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,31.3,23.2,20.7
44
- Other,[Unified-IO-2 7B (2.5M)](https://unified-io-2.allenai.org),from scratch,7B,PPL,60.5,65.6,46.0,70.7,69.0,67.4,55.4,62.6,45.5,60.8,67.1,58.1,57.5,43.2,34.0
45
- Other,[Unified-IO-2 7B](https://unified-io-2.allenai.org),from scratch,7B,PPL,60.4,65.5,46.0,71.3,68.8,67.5,55.5,61.2,45.4,62.9,66.5,59.3,58.0,42.7,34.0
46
- Other,[Unified-IO-2 3B (3M)](https://unified-io-2.allenai.org),from scratch,3B,PPL,60.2,64.1,45.6,69.0,66.6,66.5,54.3,62.0,42.3,50.5,65.3,44.2,57.5,36.2,39.4
47
- Other,[Unified-IO-2 3B](https://unified-io-2.allenai.org),from scratch,3B,PPL,58.7,63.8,44.2,68.8,65.8,67.2,52.9,60.4,43.1,55.7,64.0,41.9,57.5,36.0,39.0
48
- Other,[Unified-IO-2 1B](https://unified-io-2.allenai.org),from scratch,1B,PPL,49.6,55.1,34.0,63.8,57.7,54.6,41.9,53.7,33.3,51.5,58.3,47.7,39.8,34.5,24.6
49
  ImageLLM,[Weitu-VL-1.0](https://weitu.ai/),Weitu-13B,15B,Generate,69.2,74.2,50.5,78.2,77.2,76.9,67.6,68.7,57.8,74.2,78.2,53.5,55.7,43.4,51.2
50
- ImageLLM,[LLaVA-7B + detection and grounding trained](https://llava-vl.github.io),Vicuna-7B,7B,Generate,0,67.3,0,74.5,70.9,69,62.3,57.4,51.9,69.1,77,41.9,0,0,0
 
 
1
  Model Type,Model,Language Model,Model Size,Evaluation Method,Avg. All,Avg. Img,Avg. Video,Scene Understanding,Instance Identity,Instance Attribute,Instance Location,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Action Recognition,Action Prediction,Procedure Understanding
2
+ LLM,[Flan-T5](https://huggingface.co/google/flan-t5-xl),Flan-T5-XL,3B,PPL,27.7,27.3,28.6,23,29,32.8,31.8,20.5,31.8,33,18.2,19.4,23.2,34.9,25.4
3
  LLM,[Vicuna](https://huggingface.co/lmsys/vicuna-7b-v1.3),Vicuna-7B,7B,PPL,28.5,28.2,29.5,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,27.3,34.5,23.8
4
+ LLM,[LLaMA](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/),LLaMA-7B,7B,PPL,26.8,26.6,27.3,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37,9,33,23.1,26.2
5
+ ImageLLM,[BLIP-2](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,46.4,49.7,36.7,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,32.6,47.5,24
6
  ImageLLM,[InstructBLIP](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,52.7,57.8,38.3,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,33.1,49.1,27.1
7
  ImageLLM,[InstructBLIP-Vicuna](https://github.com/salesforce/LAVIS),Vicuna-7B,7B,PPL,53.4,58.8,38.1,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,34.5,49.6,23.1
8
+ ImageLLM,[LLaVA-1.5](https://github.com/haotian-liu/LLaVA),Vicuna-13B,13B,Generate,61.6,68.2,42.7,74.9,71.3,68.9,63.5,61.3,51.4,73.2,77,60.5,48.9,41.1,36.6
9
+ ImageLLM,[LLaVA-v1.5-13B-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL,62.4,68.2,40.5,74.9,70.9,70.1,62.5,60.6,52.4,74.2,77.3,26.7,47.5,36,35.7
10
+ ImageLLM,[LLaVA-v1.5-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL for A/B/C/D,62.8,68.9,39.5,75.2,71.4,72,62.7,59.8,51.1,71.1,80.7,39.5,46.1,37.1,32.6
11
  ImageLLM,[MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4),Vicuna-7B,7B,PPL,42.8,47.4,29.9,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,38.2,24.5,27.1
12
+ ImageLLM,[VPGTrans](https://github.com/VPGTrans/VPGTrans),LLaMA-7B,7B,PPL,39.1,41.8,31.4,51.9,44.1,39.9,36.1,33.7,36.4,32,53.2,30.6,39.5,24.3,31.9
13
+ ImageLLM,[MultiModal-GPT](https://github.com/open-mmlab/Multimodal-GPT),LLaMA-7B,7B,PPL,33.2,34.5,29.2,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,36.9,25.8,24
14
+ ImageLLM,[Otter](https://github.com/Luodian/Otter),LLaMA-7B,7B,PPL,33.9,35.2,30.4,44.9,38.6,32.2,30.9,26.3,31.8,32,51.4,31.8,37.9,27.2,24.8
15
  ImageLLM,[Otter](https://github.com/Luodian/Otter),MPT-7B,7B,PPL,39.7,42.9,30.6,51.3,43.5,42.3,34.2,38.4,30.9,40.2,55.3,24.7,36.8,29.2,23.8
16
+ ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),LLaMA-7B,7B,PPL,33.1,34.5,29.3,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20,37.2,25.4,24.2
17
+ ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),MPT-7B,7B,PPL,40.9,42.7,35.7,53.2,45.3,40,31.2,39.3,32.6,36.1,51.4,25.9,42.9,34.7,26.9
18
+ ImageLLM,[LLaMA-AdapterV2](https://github.com/OpenGVLab/LLaMA-Adapter),LLaMA-7B,7B,PPL,32.7,35.2,25.8,45.2,38.5,29.3,33,29.7,35.5,39.2,52,24.7,38.6,18.5,19.6
19
+ ImageLLM,[GVT](https://github.com/TencentARC/GVT),Vicuna-7B,7B,PPL,33.5,35.5,27.8,41.7,35.5,31.8,29.5,36.2,32,32,51.1,27.1,33.9,25.4,23
20
+ ImageLLM,[mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,PPL,34,37.9,23,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,26.7,17.9,26.5
21
+ ImageLLM,[Kosmos-2](https://github.com/microsoft/unilm/tree/master/kosmos-2),Decoder Only 1.3B,1.3B,PPL,50,54.4,37.5,63.4,57.1,58.5,44,41.4,37.9,55.7,60.7,25.9,41.3,40.4,27
22
  ImageLLM,[Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat),Qwen-7B,7B,PPL for A/B/C/D,58.2,65.4,37.8,73.3,67.3,69.6,57.7,52.9,48.2,59.8,74.6,53.5,43.9,39.2,26.7
23
+ ImageLLM,[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL),Qwen-7B,7B,PPL for A/B/C/D,56.3,62.3,39.1,71.2,66.4,67.7,53.5,44.8,43.8,62.9,74.9,51.2,44.7,38.5,32
24
+ ImageLLM,[Qwen-VL-plus](https://github.com/QwenLM/Qwen-VL/tree/master?tab=readme-ov-file#qwen-vl-plus),Qwen-LM,-,PPL for A/B/C/D,66.8,72.7,44.7,76.5,77.6,75.3,64.9,66.3,56.8,69.1,78.2,54.7,51.6,39.2,41
25
+ ImageLLM,[IDEFICS-9b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-7B,7B,NG,0,44.5,0,55.8,45.3,42.3,40.2,36.8,34.9,37.1,55.9,38.8,0,0,0
26
+ ImageLLM,[IDEFICS-80b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-65B,65B,NG,0,53.2,0,64,52.6,50.8,48.3,46.1,45.5,62.9,68,51.8,0,0,0
27
+ ImageLLM,[InternLM-XComposer-VL](https://github.com/InternLM/InternLM-XComposer),InternLM-7B,7B,PPL,0,66.9,0,75,71.7,67.6,60.8,56.2,55.3,74.4,77,48.5,0,0,0
28
+ ImageLLM,[InternLM-XComposer2-VL-7B](https://github.com/InternLM/InternLM-XComposer),InternLM2,7B,Generate,0,75.9,0,79.3,78.4,77.8,72.7,70,63.5,79.4,80.1,68.6,0,0,0
29
+ ImageLLM,[InternVL-Chat-V1.2-Plus](https://github.com/OpenGVLab/InternVL),Nous-Hermes-2-Yi-34B,34B,Generate,70.4,76.4,47.7,80.2,80,77.8,71.3,72.3,63.3,77.3,79.8,50,49.4,41.8,52.2
30
+ ImageLLM,[SEED-LLaMA](https://github.com/AILab-CVC/SEED),LLaMA2-Chat-13B,13B,PPL,48.9,53.7,35.4,64.1,54.2,54.1,46.5,45.3,38.2,51.6,60.7,44.7,37.8,45.3,20
31
+ ImageLLM,[mPLUG-Owl2](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,NG,57.8,64.1,39.8,72.7,67.6,63.6,53.6,58.5,50.8,70.1,76.4,30.2,46,38.7,32.9
32
  ImageLLM,[LLaMA-VID-7B](https://github.com/dvlab-research/LLaMA-VID),LLaMA-7B,7B,Generate,59.9,67.6,37.9,75.4,71.2,68.9,62.9,58.4,50.7,70.1,76.1,54.7,42.8,35.2,35.6
33
+ ImageLLM,[Pink-LLaMA2](https://github.com/SY-Xuan/Pink/stargazers),LLaMA2-7B,7B,NG,0,67,0,75.2,70.1,70.1,63.3,53.8,50.2,69.1,74.3,50,0,0,0
34
+ ImageLLM,[InfMLLM-13B](https://github.com/mightyzau/InfMLLM),Vicuna-13B,13B,Generate,62.3,69.6,41.5,75.5,73,70.4,66.2,63.3,54.2,72.2,77.9,37.2,49.5,39,33.9
35
+ ImageLLM,[ShareGPT4V-7B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-7B,7B,Generate,0,69.7,0,75.3,71.4,72.3,63.1,62,53.9,70.1,79.8,54.7,0,0,0
36
+ ImageLLM,[ShareGPT4V-13B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-13B,13B,Generate,0,70.8,0,75.9,74.1,73.5,66.8,62.4,54.8,75.3,77.3,46.5,0,0,0
37
+ ImageLLM,[Honeybee-13B](https://github.com/kakaobrain/honeybee),Vicuna-13B,13B,Generate,0,68.6,0,75.4,72.8,69,64.5,60.6,55.1,72.2,77.9,41.9,0,0,0
38
+ ImageLLM,[SPHINXv1-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,63.9,71.6,34.7,75.4,72.2,75.1,64.2,68.2,49.3,66,78.6,62.4,41.2,33.9,26.1
39
  ImageLLM,[SPHINXv2-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,67.5,74.8,39.8,77.7,77.4,76.8,69.4,71.2,59.4,70.1,78.3,74.1,48.1,37.9,29.9
40
  ImageLLM,[GPT-4V](https://openai.com/research/gpt-4v-system-card),\-,-,Generate,67.3,69.1,60.5,77.5,73.9,70.6,61.8,56.8,56.9,74.2,78.5,57.6,65.7,51.7,63.4
41
+ VideoLLM,[VideoChat](https://github.com/OpenGVLab/Ask-Anything),Vicuna-7B,7B,PPL,37.6,39,33.7,47.1,43.8,34.9,40,32.8,34.6,42.3,50.5,17.7,34.9,36.4,27.3
42
  VideoLLM,[Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT),LLaMA-7B,7B,PPL,31.2,33.9,23.5,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,27.6,21.3,21.1
43
+ VideoLLM,[Valley](https://github.com/RupertLuo/Valley),LLaMA-13B,13B,PPL,30.3,32,25.4,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,31.3,23.2,20.7
44
+ Other,[Unified-IO-2 7B (2.5M)](https://unified-io-2.allenai.org),from scratch,7B,PPL,60.5,65.6,46,70.7,69,67.4,55.4,62.6,45.5,60.8,67.1,58.1,57.5,43.2,34
45
+ Other,[Unified-IO-2 7B](https://unified-io-2.allenai.org),from scratch,7B,PPL,60.4,65.5,46,71.3,68.8,67.5,55.5,61.2,45.4,62.9,66.5,59.3,58,42.7,34
46
+ Other,[Unified-IO-2 3B (3M)](https://unified-io-2.allenai.org),from scratch,3B,PPL,60.2,64.1,45.6,69,66.6,66.5,54.3,62,42.3,50.5,65.3,44.2,57.5,36.2,39.4
47
+ Other,[Unified-IO-2 3B](https://unified-io-2.allenai.org),from scratch,3B,PPL,58.7,63.8,44.2,68.8,65.8,67.2,52.9,60.4,43.1,55.7,64,41.9,57.5,36,39
48
+ Other,[Unified-IO-2 1B](https://unified-io-2.allenai.org),from scratch,1B,PPL,49.6,55.1,34,63.8,57.7,54.6,41.9,53.7,33.3,51.5,58.3,47.7,39.8,34.5,24.6
49
  ImageLLM,[Weitu-VL-1.0](https://weitu.ai/),Weitu-13B,15B,Generate,69.2,74.2,50.5,78.2,77.2,76.9,67.6,68.7,57.8,74.2,78.2,53.5,55.7,43.4,51.2
50
+ ImageLLM,[LLaVA-7B + detection and grounding trained](https://llava-vl.github.io),Vicuna-7B,7B,Generate,0,67.3,0,74.5,70.9,69,62.3,57.4,51.9,69.1,77,41.9,0,0,0
51
+ ImageLLM,[MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2),MiniCPM-2B,2.8B,Generate,0,67.1,0,75.4,69,68.6,57.2,61.3,45.2,70.1,78.2,42.4,0,0,0
file/result_task.csv CHANGED
@@ -1,50 +1,51 @@
1
  Model Type,Model,Language Model,Model Size,Evaluation Method,Avg. All,Avg. Img,Avg. Video,Scene Understanding,Instance Identity,Instance Attribute,Instance Location,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Action Recognition,Action Prediction,Procedure Understanding
2
- LLM,[Flan-T5](https://huggingface.co/google/flan-t5-xl),Flan-T5-XL,3B,PPL,26.9,26.6,27.8,23.0,29.0,32.8,31.8,20.5,31.8,33.0,18.2,19.4,23.2,34.9,25.4
3
  LLM,[Vicuna](https://huggingface.co/lmsys/vicuna-7b-v1.3),Vicuna-7B,7B,PPL,26.8,26.2,28.5,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,27.3,34.5,23.8
4
- LLM,[LLaMA](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/),LLaMA-7B,7B,PPL,25.8,25.3,27.4,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37.0,9.0,33.0,23.1,26.2
5
- ImageLLM,[BLIP-2](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,43.0,45.7,34.7,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,32.6,47.5,24.0
6
  ImageLLM,[InstructBLIP](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,46.1,49.3,36.4,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,33.1,49.1,27.1
7
  ImageLLM,[InstructBLIP-Vicuna](https://github.com/salesforce/LAVIS),Vicuna-7B,7B,PPL,48.1,52.2,35.7,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,34.5,49.6,23.1
8
- ImageLLM,[LLaVA-1.5](https://github.com/haotian-liu/LLaVA),Vicuna-13B,13B,Generate,60.7,66.9,42.2,74.9,71.3,68.9,63.5,61.3,51.4,73.2,77.0,60.5,48.9,41.1,36.6
9
- ImageLLM,[LLaVA-v1.5-13B-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL,57.4,63.3,39.7,74.9,70.9,70.1,62.5,60.6,52.4,74.2,77.3,26.7,47.5,36.0,35.7
10
- ImageLLM,[LLaVA-v1.5-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL for A/B/C/D,58.3,64.8,38.6,75.2,71.4,72.0,62.7,59.8,51.1,71.1,80.7,39.5,46.1,37.1,32.6
11
  ImageLLM,[MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4),Vicuna-7B,7B,PPL,39.4,42.6,29.9,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,38.2,24.5,27.1
12
- ImageLLM,[VPGTrans](https://github.com/VPGTrans/VPGTrans),LLaMA-7B,7B,PPL,37.8,39.8,31.9,51.9,44.1,39.9,36.1,33.7,36.4,32.0,53.2,30.6,39.5,24.3,31.9
13
- ImageLLM,[MultiModal-GPT](https://github.com/open-mmlab/Multimodal-GPT),LLaMA-7B,7B,PPL,32.3,33.5,28.9,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,36.9,25.8,24.0
14
- ImageLLM,[Otter](https://github.com/Luodian/Otter),LLaMA-7B,7B,PPL,34.2,35.5,30.0,44.9,38.6,32.2,30.9,26.3,31.8,32.0,51.4,31.8,37.9,27.2,24.8
15
  ImageLLM,[Otter](https://github.com/Luodian/Otter),MPT-7B,7B,PPL,37.6,40.1,29.9,51.3,43.5,42.3,34.2,38.4,30.9,40.2,55.3,24.7,36.8,29.2,23.8
16
- ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),LLaMA-7B,7B,PPL,32.4,33.5,28.9,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20.0,37.2,25.4,24.2
17
- ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),MPT-7B,7B,PPL,38.3,39.4,34.8,53.2,45.3,40.0,31.2,39.3,32.6,36.1,51.4,25.9,42.9,34.7,26.9
18
- ImageLLM,[LLaMA-AdapterV2](https://github.com/OpenGVLab/LLaMA-Adapter),LLaMA-7B,7B,PPL,33.7,36.3,25.6,45.2,38.5,29.3,33.0,29.7,35.5,39.2,52.0,24.7,38.6,18.5,19.6
19
- ImageLLM,[GVT](https://github.com/TencentARC/GVT),Vicuna-7B,7B,PPL,33.3,35.2,27.4,41.7,35.5,31.8,29.5,36.2,32.0,32.0,51.1,27.1,33.9,25.4,23.0
20
  ImageLLM,[mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,PPL,35.3,39.1,23.7,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,26.7,17.9,26.5
21
- ImageLLM,[Kosmos-2](https://github.com/microsoft/unilm/tree/master/kosmos-2),Decoder Only 1.3B,1.3B,PPL,46.1,49.4,36.2,63.4,57.1,58.5,44.0,41.4,37.9,55.7,60.7,25.9,41.3,40.4,27.0
22
  ImageLLM,[Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat),Qwen-7B,7B,PPL for A/B/C/D,55.6,61.9,36.6,73.3,67.3,69.6,57.7,52.9,48.2,59.8,74.6,53.5,43.9,39.2,26.7
23
- ImageLLM,[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL),Qwen-7B,7B,PPL for A/B/C/D,54.3,59.6,38.4,71.2,66.4,67.7,53.5,44.8,43.8,62.9,74.9,51.2,44.7,38.5,32.0
24
- ImageLLM,[Qwen-VL-plus](https://github.com/QwenLM/Qwen-VL/tree/master?tab=readme-ov-file#qwen-vl-plus),Qwen-LM,-,PPL for A/B/C/D,62.6,68.8,43.9,76.5,77.6,75.3,64.9,66.3,56.8,69.1,78.2,54.7,51.6,39.2,41.0
25
- ImageLLM,[IDEFICS-9b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-7B,7B,NG,32.3,43.0,0.0,55.8,45.3,42.3,40.2,36.8,34.9,37.1,55.9,38.8,0.0,0.0,0.0
26
- ImageLLM,[IDEFICS-80b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-65B,65B,NG,40.8,54.4,0.0,64.0,52.6,50.8,48.3,46.1,45.5,62.9,68.0,51.8,0.0,0.0,0.0
27
- ImageLLM,[InternLM-XComposer-VL](https://github.com/InternLM/InternLM-XComposer),InternLM-7B,7B,PPL,48.9,65.2,0.0,75.0,71.7,67.6,60.8,56.2,55.3,74.4,77.0,48.5,0.0,0.0,0.0
28
- ImageLLM,[InternLM-XComposer2-VL-7B](https://github.com/InternLM/InternLM-XComposer),InternLM2,7B,Generate,0.0,74.4,0.0,79.3,78.4,77.8,72.7,70.0,63.5,79.4,80.1,68.6,0.0,0.0,0.0
29
- ImageLLM,[InternVL-Chat-V1.2-Plus](https://github.com/OpenGVLab/InternVL),Nous-Hermes-2-Yi-34B,34B,Generate,66.3,72.4,47.8,80.2,80.0,77.8,71.3,72.3,63.3,77.3,79.8,50.0,49.4,41.8,52.2
30
- ImageLLM,[SEED-LLaMA](https://github.com/AILab-CVC/SEED),LLaMA2-Chat-13B,13B,PPL,46.9,51.0,34.4,64.1,54.2,54.1,46.5,45.3,38.2,51.6,60.7,44.7,37.8,45.3,20.0
31
- ImageLLM,[mPLUG-Owl2](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,NG,55.1,60.4,39.2,72.7,67.6,63.6,53.6,58.5,50.8,70.1,76.4,30.2,46.0,38.7,32.9
32
  ImageLLM,[LLaMA-VID-7B](https://github.com/dvlab-research/LLaMA-VID),LLaMA-7B,7B,Generate,58.5,65.4,37.9,75.4,71.2,68.9,62.9,58.4,50.7,70.1,76.1,54.7,42.8,35.2,35.6
33
- ImageLLM,[Pink-LLaMA2](https://github.com/SY-Xuan/Pink/stargazers),LLaMA2-7B,7B,NG,48.0,64.0,0.0,75.2,70.1,70.1,63.3,53.8,50.2,69.1,74.3,50.0,0.0,0.0,0.0
34
- ImageLLM,[InfMLLM-13B](https://github.com/mightyzau/InfMLLM),Vicuna-13B,13B,Generate,59.4,65.5,40.8,75.5,73.0,70.4,66.2,63.3,54.2,72.2,77.9,37.2,49.5,39.0,33.9
35
- ImageLLM,[ShareGPT4V-7B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-7B,7B,Generate,50.2,67.0,0.0,75.3,71.4,72.3,63.1,62.0,53.9,70.1,79.8,54.7,0.0,0.0,0.0
36
- ImageLLM,[ShareGPT4V-13B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-13B,13B,Generate,50.6,67.4,0.0,75.9,74.1,73.5,66.8,62.4,54.8,75.3,77.3,46.5,0.0,0.0,0.0
37
- ImageLLM,[Honeybee-13B](https://github.com/kakaobrain/honeybee),Vicuna-13B,13B,Generate,49.1,65.5,0.0,75.4,72.8,69.0,64.5,60.6,55.1,72.2,77.9,41.9,0.0,0.0,0.0
38
- ImageLLM,[SPHINXv1-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,59.4,67.9,33.7,75.4,72.2,75.1,64.2,68.2,49.3,66.0,78.6,62.4,41.2,33.9,26.1
39
  ImageLLM,[SPHINXv2-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,64.2,72.7,38.6,77.7,77.4,76.8,69.4,71.2,59.4,70.1,78.3,74.1,48.1,37.9,29.9
40
  ImageLLM,[GPT-4V](https://openai.com/research/gpt-4v-system-card),\-,-,Generate,65.7,67.5,60.3,77.5,73.9,70.6,61.8,56.8,56.9,74.2,78.5,57.6,65.7,51.7,63.4
41
- VideoLLM,[VideoChat](https://github.com/OpenGVLab/Ask-Anything),Vicuna-7B,7B,PPL,36.9,38.2,32.9,47.1,43.8,34.9,40.0,32.8,34.6,42.3,50.5,17.7,34.9,36.4,27.3
42
  VideoLLM,[Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT),LLaMA-7B,7B,PPL,29.8,31.9,23.3,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,27.6,21.3,21.1
43
  VideoLLM,[Valley](https://github.com/RupertLuo/Valley),LLaMA-13B,13B,PPL,28.7,29.9,25.1,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,31.3,23.2,20.7
44
- Other,[Unified-IO-2 7B (2.5M)](https://unified-io-2.allenai.org),from scratch,7B,PPL,57.6,61.8,44.9,70.7,69.0,67.4,55.4,62.6,45.5,60.8,67.1,58.1,57.5,43.2,34.0
45
- Other,[Unified-IO-2 7B](https://unified-io-2.allenai.org),from scratch,7B,PPL,57.8,62.0,44.9,71.3,68.8,67.5,55.5,61.2,45.4,62.9,66.5,59.3,58.0,42.7,34.0
46
- Other,[Unified-IO-2 3B (3M)](https://unified-io-2.allenai.org),from scratch,3B,PPL,54.5,57.9,44.4,69.0,66.6,66.5,54.3,62.0,42.3,50.5,65.3,44.2,57.5,36.2,39.4
47
- Other,[Unified-IO-2 3B](https://unified-io-2.allenai.org),from scratch,3B,PPL,54.4,57.8,44.2,68.8,65.8,67.2,52.9,60.4,43.1,55.7,64.0,41.9,57.5,36.0,39.0
48
- Other,[Unified-IO-2 1B](https://unified-io-2.allenai.org),from scratch,1B,PPL,46.8,51.4,33.0,63.8,57.7,54.6,41.9,53.7,33.3,51.5,58.3,47.7,39.8,34.5,24.6
49
  ImageLLM,[Weitu-VL-1.0](https://weitu.ai/),Weitu-13B,15B,Generate,65.2,70.3,50.1,78.2,77.2,76.9,67.6,68.7,57.8,74.2,78.2,53.5,55.7,43.4,51.2
50
- ImageLLM,[LLaVA-7B + detection and grounding trained](https://llava-vl.github.io),Vicuna-7B,7B,Generate,0,63.8,0,74.5,70.9,69,62.3,57.4,51.9,69.1,77,41.9,0,0,0
 
 
1
  Model Type,Model,Language Model,Model Size,Evaluation Method,Avg. All,Avg. Img,Avg. Video,Scene Understanding,Instance Identity,Instance Attribute,Instance Location,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Action Recognition,Action Prediction,Procedure Understanding
2
+ LLM,[Flan-T5](https://huggingface.co/google/flan-t5-xl),Flan-T5-XL,3B,PPL,26.9,26.6,27.8,23,29,32.8,31.8,20.5,31.8,33,18.2,19.4,23.2,34.9,25.4
3
  LLM,[Vicuna](https://huggingface.co/lmsys/vicuna-7b-v1.3),Vicuna-7B,7B,PPL,26.8,26.2,28.5,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,27.3,34.5,23.8
4
+ LLM,[LLaMA](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/),LLaMA-7B,7B,PPL,25.8,25.3,27.4,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37,9,33,23.1,26.2
5
+ ImageLLM,[BLIP-2](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,43,45.7,34.7,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,32.6,47.5,24
6
  ImageLLM,[InstructBLIP](https://github.com/salesforce/LAVIS),Flan-T5-XL,3B,PPL,46.1,49.3,36.4,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,33.1,49.1,27.1
7
  ImageLLM,[InstructBLIP-Vicuna](https://github.com/salesforce/LAVIS),Vicuna-7B,7B,PPL,48.1,52.2,35.7,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,34.5,49.6,23.1
8
+ ImageLLM,[LLaVA-1.5](https://github.com/haotian-liu/LLaVA),Vicuna-13B,13B,Generate,60.7,66.9,42.2,74.9,71.3,68.9,63.5,61.3,51.4,73.2,77,60.5,48.9,41.1,36.6
9
+ ImageLLM,[LLaVA-v1.5-13B-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL,57.4,63.3,39.7,74.9,70.9,70.1,62.5,60.6,52.4,74.2,77.3,26.7,47.5,36,35.7
10
+ ImageLLM,[LLaVA-v1.5-LoRA](https://llava-vl.github.io),Vicuna-13B-v1.5,13B,PPL for A/B/C/D,58.3,64.8,38.6,75.2,71.4,72,62.7,59.8,51.1,71.1,80.7,39.5,46.1,37.1,32.6
11
  ImageLLM,[MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4),Vicuna-7B,7B,PPL,39.4,42.6,29.9,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,38.2,24.5,27.1
12
+ ImageLLM,[VPGTrans](https://github.com/VPGTrans/VPGTrans),LLaMA-7B,7B,PPL,37.8,39.8,31.9,51.9,44.1,39.9,36.1,33.7,36.4,32,53.2,30.6,39.5,24.3,31.9
13
+ ImageLLM,[MultiModal-GPT](https://github.com/open-mmlab/Multimodal-GPT),LLaMA-7B,7B,PPL,32.3,33.5,28.9,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,36.9,25.8,24
14
+ ImageLLM,[Otter](https://github.com/Luodian/Otter),LLaMA-7B,7B,PPL,34.2,35.5,30,44.9,38.6,32.2,30.9,26.3,31.8,32,51.4,31.8,37.9,27.2,24.8
15
  ImageLLM,[Otter](https://github.com/Luodian/Otter),MPT-7B,7B,PPL,37.6,40.1,29.9,51.3,43.5,42.3,34.2,38.4,30.9,40.2,55.3,24.7,36.8,29.2,23.8
16
+ ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),LLaMA-7B,7B,PPL,32.4,33.5,28.9,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20,37.2,25.4,24.2
17
+ ImageLLM,[OpenFlamingo](https://github.com/mlfoundations/open_flamingo),MPT-7B,7B,PPL,38.3,39.4,34.8,53.2,45.3,40,31.2,39.3,32.6,36.1,51.4,25.9,42.9,34.7,26.9
18
+ ImageLLM,[LLaMA-AdapterV2](https://github.com/OpenGVLab/LLaMA-Adapter),LLaMA-7B,7B,PPL,33.7,36.3,25.6,45.2,38.5,29.3,33,29.7,35.5,39.2,52,24.7,38.6,18.5,19.6
19
+ ImageLLM,[GVT](https://github.com/TencentARC/GVT),Vicuna-7B,7B,PPL,33.3,35.2,27.4,41.7,35.5,31.8,29.5,36.2,32,32,51.1,27.1,33.9,25.4,23
20
  ImageLLM,[mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,PPL,35.3,39.1,23.7,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,26.7,17.9,26.5
21
+ ImageLLM,[Kosmos-2](https://github.com/microsoft/unilm/tree/master/kosmos-2),Decoder Only 1.3B,1.3B,PPL,46.1,49.4,36.2,63.4,57.1,58.5,44,41.4,37.9,55.7,60.7,25.9,41.3,40.4,27
22
  ImageLLM,[Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat),Qwen-7B,7B,PPL for A/B/C/D,55.6,61.9,36.6,73.3,67.3,69.6,57.7,52.9,48.2,59.8,74.6,53.5,43.9,39.2,26.7
23
+ ImageLLM,[Qwen-VL](https://huggingface.co/Qwen/Qwen-VL),Qwen-7B,7B,PPL for A/B/C/D,54.3,59.6,38.4,71.2,66.4,67.7,53.5,44.8,43.8,62.9,74.9,51.2,44.7,38.5,32
24
+ ImageLLM,[Qwen-VL-plus](https://github.com/QwenLM/Qwen-VL/tree/master?tab=readme-ov-file#qwen-vl-plus),Qwen-LM,-,PPL for A/B/C/D,62.6,68.8,43.9,76.5,77.6,75.3,64.9,66.3,56.8,69.1,78.2,54.7,51.6,39.2,41
25
+ ImageLLM,[IDEFICS-9b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-7B,7B,NG,32.3,43,0,55.8,45.3,42.3,40.2,36.8,34.9,37.1,55.9,38.8,0,0,0
26
+ ImageLLM,[IDEFICS-80b-instruct](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct),LLaMA-65B,65B,NG,40.8,54.4,0,64,52.6,50.8,48.3,46.1,45.5,62.9,68,51.8,0,0,0
27
+ ImageLLM,[InternLM-XComposer-VL](https://github.com/InternLM/InternLM-XComposer),InternLM-7B,7B,PPL,48.9,65.2,0,75,71.7,67.6,60.8,56.2,55.3,74.4,77,48.5,0,0,0
28
+ ImageLLM,[InternLM-XComposer2-VL-7B](https://github.com/InternLM/InternLM-XComposer),InternLM2,7B,Generate,0,74.4,0,79.3,78.4,77.8,72.7,70,63.5,79.4,80.1,68.6,0,0,0
29
+ ImageLLM,[InternVL-Chat-V1.2-Plus](https://github.com/OpenGVLab/InternVL),Nous-Hermes-2-Yi-34B,34B,Generate,66.3,72.4,47.8,80.2,80,77.8,71.3,72.3,63.3,77.3,79.8,50,49.4,41.8,52.2
30
+ ImageLLM,[SEED-LLaMA](https://github.com/AILab-CVC/SEED),LLaMA2-Chat-13B,13B,PPL,46.9,51,34.4,64.1,54.2,54.1,46.5,45.3,38.2,51.6,60.7,44.7,37.8,45.3,20
31
+ ImageLLM,[mPLUG-Owl2](https://github.com/X-PLUG/mPLUG-Owl),LLaMA-7B,7B,NG,55.1,60.4,39.2,72.7,67.6,63.6,53.6,58.5,50.8,70.1,76.4,30.2,46,38.7,32.9
32
  ImageLLM,[LLaMA-VID-7B](https://github.com/dvlab-research/LLaMA-VID),LLaMA-7B,7B,Generate,58.5,65.4,37.9,75.4,71.2,68.9,62.9,58.4,50.7,70.1,76.1,54.7,42.8,35.2,35.6
33
+ ImageLLM,[Pink-LLaMA2](https://github.com/SY-Xuan/Pink/stargazers),LLaMA2-7B,7B,NG,48,64,0,75.2,70.1,70.1,63.3,53.8,50.2,69.1,74.3,50,0,0,0
34
+ ImageLLM,[InfMLLM-13B](https://github.com/mightyzau/InfMLLM),Vicuna-13B,13B,Generate,59.4,65.5,40.8,75.5,73,70.4,66.2,63.3,54.2,72.2,77.9,37.2,49.5,39,33.9
35
+ ImageLLM,[ShareGPT4V-7B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-7B,7B,Generate,50.2,67,0,75.3,71.4,72.3,63.1,62,53.9,70.1,79.8,54.7,0,0,0
36
+ ImageLLM,[ShareGPT4V-13B](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V),Vicuna-13B,13B,Generate,50.6,67.4,0,75.9,74.1,73.5,66.8,62.4,54.8,75.3,77.3,46.5,0,0,0
37
+ ImageLLM,[Honeybee-13B](https://github.com/kakaobrain/honeybee),Vicuna-13B,13B,Generate,49.1,65.5,0,75.4,72.8,69,64.5,60.6,55.1,72.2,77.9,41.9,0,0,0
38
+ ImageLLM,[SPHINXv1-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,59.4,67.9,33.7,75.4,72.2,75.1,64.2,68.2,49.3,66,78.6,62.4,41.2,33.9,26.1
39
  ImageLLM,[SPHINXv2-1k](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX),LLaMA-2-13B,13B,Generate,64.2,72.7,38.6,77.7,77.4,76.8,69.4,71.2,59.4,70.1,78.3,74.1,48.1,37.9,29.9
40
  ImageLLM,[GPT-4V](https://openai.com/research/gpt-4v-system-card),\-,-,Generate,65.7,67.5,60.3,77.5,73.9,70.6,61.8,56.8,56.9,74.2,78.5,57.6,65.7,51.7,63.4
41
+ VideoLLM,[VideoChat](https://github.com/OpenGVLab/Ask-Anything),Vicuna-7B,7B,PPL,36.9,38.2,32.9,47.1,43.8,34.9,40,32.8,34.6,42.3,50.5,17.7,34.9,36.4,27.3
42
  VideoLLM,[Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT),LLaMA-7B,7B,PPL,29.8,31.9,23.3,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,27.6,21.3,21.1
43
  VideoLLM,[Valley](https://github.com/RupertLuo/Valley),LLaMA-13B,13B,PPL,28.7,29.9,25.1,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,31.3,23.2,20.7
44
+ Other,[Unified-IO-2 7B (2.5M)](https://unified-io-2.allenai.org),from scratch,7B,PPL,57.6,61.8,44.9,70.7,69,67.4,55.4,62.6,45.5,60.8,67.1,58.1,57.5,43.2,34
45
+ Other,[Unified-IO-2 7B](https://unified-io-2.allenai.org),from scratch,7B,PPL,57.8,62,44.9,71.3,68.8,67.5,55.5,61.2,45.4,62.9,66.5,59.3,58,42.7,34
46
+ Other,[Unified-IO-2 3B (3M)](https://unified-io-2.allenai.org),from scratch,3B,PPL,54.5,57.9,44.4,69,66.6,66.5,54.3,62,42.3,50.5,65.3,44.2,57.5,36.2,39.4
47
+ Other,[Unified-IO-2 3B](https://unified-io-2.allenai.org),from scratch,3B,PPL,54.4,57.8,44.2,68.8,65.8,67.2,52.9,60.4,43.1,55.7,64,41.9,57.5,36,39
48
+ Other,[Unified-IO-2 1B](https://unified-io-2.allenai.org),from scratch,1B,PPL,46.8,51.4,33,63.8,57.7,54.6,41.9,53.7,33.3,51.5,58.3,47.7,39.8,34.5,24.6
49
  ImageLLM,[Weitu-VL-1.0](https://weitu.ai/),Weitu-13B,15B,Generate,65.2,70.3,50.1,78.2,77.2,76.9,67.6,68.7,57.8,74.2,78.2,53.5,55.7,43.4,51.2
50
+ ImageLLM,[LLaVA-7B + detection and grounding trained](https://llava-vl.github.io),Vicuna-7B,7B,Generate,0,63.8,0,74.5,70.9,69,62.3,57.4,51.9,69.1,77,41.9,0,0,0
51
+ ImageLLM,[MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2),MiniCPM-2B,2.8B,Generate,0,63,0,75.4,69,68.6,57.2,61.3,45.2,70.1,78.2,42.4,0,0,0