--- language: - zh - en pipeline_tag: visual-question-answering datasets: - Lin-Chen/ShareGPT4V - liuhaotian/LLaVA-Pretrain --- # Model llava-qwen1.5-4b-chat is a lightweight multimodal models base on [LLaVA architecture](https://llava-vl.github.io/). - Language Model: [Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat) - Vision Encoder: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) - Total Paramters: 4,388,102,720 ## Evaluation ### MMBench Model | MMBench Test (EN) | MMBench Dev (EN) | MMBench Test (CN) | MMBench Dev (CN) | CCBench Dev ------------- | ------------- | ------------- | ------------- | ------------- | ------------- LLaVA-v1.5-7B | 67.7 | 69.2 | 61.0 | 59.7 | 28.4 LLaVA-InternLM-7B | 69.0 | 68.5 | 66.7 | 63.8 | 37.3 LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5 Bunny-3B | 69.2 | 68.6 | - | - | - MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4 llava-qwen1.5-4b-chat | 69.6 | 69.2 | 68.6 | 68.3 | 41.0 ## Uses TBD ## Training Details TBD