File size: 2,632 Bytes
791df9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Type,Model,Language Model,Avg,Action Antonym,Action Count,Action Localization,Action Prediction,Action Sequence,Character Count,Counterfactual Inference,Egocentric Navigation,Episodic Reasoning,Fine grained Action,Fine grained Pose,Moving Attribute,Moving Count,Moving Direction,Object Existence,Object Interaction,Object Shuffle,Scene Transition,State Change,Unexpected Action
LLM,Random,NOLLM,28.0,33.3,33.3,25.0,25.0,25.0,33.3,30.9,25.0,20.0,25.0,25.0,33.3,25.0,25.0,33.3,25.0,33.3,25.0,33.3,25.0
ImageLLM,mPLUG-Owl-I,LLaMA-7B,29.4,44.5,34.5,24.0,20.0,25.0,37.0,37.0,25.5,21.0,27.0,24.0,31.5,22.0,23.0,36.0,24.0,34.0,34.5,40.0,23.5
ImageLLM,LLaMA-Adapter,LLaMA-7B,31.7,51.0,29.0,21.5,28.0,23.0,31.5,32.0,22.5,28.0,30.0,25.0,41.5,22.5,25.5,53.5,32.5,33.5,30.5,39.5,33.0
ImageLLM,BLIP2,FlanT5-XL,31.4,33.5,25.5,26.0,29.0,24.5,30.0,31.0,26.0,37.0,17.0,27.0,40.0,30.0,25.5,51.5,26.0,31.0,32.5,42.0,42.0
ImageLLM,Otter-I,MPT-7B,33.5,39.5,20.0,25.5,32.0,34.5,27.0,36.5,32.0,29.0,30.5,28.0,28.5,32.5,19.0,48.5,44.0,29.5,55.0,39.0,38.5
ImageLLM,MiniGPT-4,Vicuna-7B,18.8,26.0,32.5,12.0,18.0,16.0,29.5,3.0,19.0,9.9,21.5,26.0,8.0,15.5,11.5,29.5,25.5,13.0,9.5,34.0,16.0
ImageLLM,InstructBLIP,Vicuna-7B,32.5,46.0,42.5,23.0,16.5,20.0,30.0,38.0,25.5,30.5,24.5,25.5,40.5,26.5,22.0,51.0,26.0,37.5,46.5,32.0,46.0
ImageLLM,LLaVA,Vicuna-7B,36.0,63.0,34.0,20.5,39.5,28.0,36.0,42.0,27.0,26.5,30.5,25.0,38.5,20.5,23.0,53.0,41.0,41.5,45.0,47.0,39.0
VideoLLM,Otter-V,LLaMA-7B,26.8,27.5,26.0,23.5,23.0,23.0,22.0,19.5,23.5,19.0,27.0,22.0,18.0,28.5,24.5,53.0,28.0,33.0,27.5,38.5,29.5
VideoLLM,mPLUG-Owl-V,LLaMA-7B,29.7,34.0,31.5,23.0,28.0,22.0,31.0,29.5,26.0,20.5,29.0,24.0,40.0,27.0,27.0,40.5,27.0,31.5,29.0,44.0,29.0
VideoLLM,VideoChatGPT,Vicuna-7B,32.7,62.0,30.5,20.0,26.0,23.5,33.0,35.5,29.5,26.0,22.5,29.0,39.5,25.5,23.0,54.0,28.0,40.0,31.0,48.5,26.5
VideoLLM,VideoLLaMA,Vicuna-7B,34.1,51.0,34.0,22.5,25.5,27.5,40.0,37.0,30.0,21.0,29.0,32.5,32.5,22.5,22.5,48.0,40.5,38.0,43.0,45.5,39.0
VideoLLM,VideoChat,Vicuna-7B,35.5,56.0,35.0,27.0,26.5,33.5,41.0,36.0,23.5,23.5,33.5,26.5,42.5,20.5,25.5,53.0,40.5,30.0,48.5,46.0,40.5
VideoLLM,VideoChat2_text,Vicuna-7B,34.7,49.5,41.5,27.0,27.0,24.5,36.0,40.0,33.0,32.0,27.0,26.5,32.5,27.5,25.5,53.0,28.0,40.0,38.5,46.5,38.0
VideoLLM,VideoChat2,Vicuna-7B,51.1,83.5,39.0,23.0,47.5,66.0,36.5,65.5,35.0,40.5,49.5,49.0,58.5,42.0,23.0,58.0,71.5,42.5,88.5,44.0,60.0
VideoLLM,GPT-4V,GPT-4,43.7,72.0,39.0,40.5,63.5,55.5,52.0,11.0,31.0,59.0,46.5,47.5,22.5,12.0,12.0,18.5,59.0,29.5,83.5,45.0,73.5
ImageLLM,GiminiPro,Gimini,37.7,43.7,3.9,40.0,41.8,35.4,38.7,33.7,36.4,36.4,36.2,26.5,41.5,18.0,16.5,43.5,37.5,39.8,75.4,42.3,67.1