---
language:
- zh
- en
pipeline_tag: visual-question-answering
datasets:
- Lin-Chen/ShareGPT4V
- liuhaotian/LLaVA-Pretrain
---

# Model
<!-- Provide a quick summary of what the model is/does. -->
llava-qwen1.5-4b-chat is a lightweight multimodal models base on [LLaVA architecture](https://llava-vl.github.io/).
- Language Model: [Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat)
- Vision Encoder: [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384)
- Total Paramters: 4,388,102,720

## Evaluation
### MMBench
Model | MMBench Test (EN) | MMBench Dev (EN) | MMBench Test (CN) | MMBench Dev (CN) | CCBench Dev
------------- | ------------- | ------------- | ------------- | ------------- | -------------
LLaVA-v1.5-7B | 67.7 | 69.2 | 61.0 | 59.7 | 28.4
LLaVA-InternLM-7B | 69.0 | 68.5 | 66.7 | 63.8 | 37.3
LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
Bunny-3B | 69.2 | 68.6 | - | - | -
MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4 
llava-qwen1.5-4b-chat | 69.6 | 69.2 | 68.6 | 68.3 | 41.0

## Uses
TBD

## Training Details
TBD