Model Type,Model,Language Model,Scene Understanding,Instance Identity,Instance Attributes,Instance Localization,Instance Counting,Spatial Relation,Instance Interaction,Visual Reasoning,Text Recognition,Avg. Img,Action Recognition,Action Prediction,Procedure Understanding,Avg. Video,Avg. All LLM,Flan-T5,Flan-T5-XL,23.0,29.0,32.8,31.8,20.5,31.8,33.0,18.2,19.4,27.32,23.2,34.9,25.4,28.57,27.65 LLM,Vicuna,Vicuna-7B,23.4,30.7,29.7,30.9,30.8,28.6,29.8,18.5,13.4,28.16,27.3,34.5,23.8,29.47,28.5 LLM,LLaMA,LLaMA-7B,26.3,27.4,26.2,28.3,25.1,28.8,19.2,37.0,9.0,26.56,33.0,23.1,26.2,27.27,26.75 ImageLLM,BLIP-2,Flan-T5-XL,59.1,53.9,49.2,42.3,43.2,36.7,55.7,45.6,25.9,49.74,32.6,47.5,24.0,36.71,46.35 ImageLLM,InstructBLIP,Flan-T5-XL,60.3,58.5,63.4,40.6,58.4,38.7,51.6,45.9,25.9,57.8,33.1,49.1,27.1,38.31,52.73 ImageLLM,InstructBLIP-Vicuna,Vicuna-7B,60.2,58.9,65.6,43.6,57.2,40.3,52.6,47.7,43.5,58.76,34.5,49.6,23.1,38.05,53.37 ImageLLM,LLaVA,LLaMA-7B,42.7,34.9,33.5,28.4,41.9,30.8,27.8,46.8,27.7,36.96,29.7,21.4,19.1,23.76,33.52 ImageLLM,MiniGPT-4,Flan-T5-XL,56.3,49.2,45.8,37.9,45.3,32.6,47.4,57.1,11.8,47.4,38.2,24.5,27.1,29.89,42.84 ImageLLM,VPGTrans,LLaMA-7B,51.9,44.1,39.9,36.1,33.7,36.4,32.0,53.2,30.6,41.81,39.5,24.3,31.9,31.4,39.1 ImageLLM,MultiModal-GPT,LLaMA-7B,43.6,37.9,31.5,30.8,27.3,30.1,29.9,51.4,18.8,34.54,36.9,25.8,24.0,29.21,33.15 ImageLLM,Otter,LLaMA-7B,44.9,38.6,32.2,30.9,26.3,31.8,32.0,51.4,31.8,35.16,37.9,27.2,24.8,30.35,33.91 ImageLLM,OpenFlamingo,LLaMA-7B,43.9,38.1,31.3,30.1,27.3,30.6,29.9,50.2,20.0,34.51,37.2,25.4,24.2,29.25,33.14 ImageLLM,LLaMA-AdapterV2,LLaMA-7B,45.2,38.5,29.3,33.0,29.7,35.5,39.2,52.0,24.7,35.19,38.6,18.5,19.6,25.75,32.73 ImageLLM,GVT,Vicuna-7B,41.7,35.5,31.8,29.5,36.2,32.0,32.0,51.1,27.1,35.49,33.9,25.4,23.0,27.77,33.48 ImageLLM,mPLUG-Owl,LLaMA-7B,49.7,45.3,32.5,36.7,27.3,32.7,44.3,54.7,28.8,37.88,26.7,17.9,26.5,23.02,34.01 VideoLLM,VideoChat,Vicuna-7B,47.1,43.8,34.9,40.0,32.8,34.6,42.3,50.5,17.7,39.02,34.9,36.4,27.3,33.68,37.63 VideoLLM,Video-ChatGPT,LLaMA-7B,37.2,31.4,33.2,28.4,35.5,29.5,23.7,42.3,25.9,33.88,27.6,21.3,21.1,23.46,31.17 VideoLLM,Valley,LLaMA-13B,39.3,32.9,31.6,27.9,24.2,30.1,27.8,43.8,11.8,32.04,31.3,23.2,20.7,25.41,30.32 LLaMA-7B,test,LLaMA-7B,53.2,45.3,40.0,31.2,39.3,32.6,36.1,51.4,25.6,42.7,42.9,34.7,26.9,35.7,40.9 LLaMA-7B,test2,LLaMA-7B,53.2,45.3,40.0,31.2,39.3,32.6,36.1,51.4,25.6,42.7,42.9,34.7,26.9,35.7,40.9