Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
|
|
10 |
|
11 |
# Introduction
|
12 |
|
13 |
-
The
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|
@@ -20,7 +20,7 @@ We plan to open-source the Infinity-MM dataset, training scripts, and related re
|
|
20 |
|
21 |
We evaluated the model using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) tool. Whenever possible, we prioritized using the GPT-4 API for test sets that support API-based evaluation.
|
22 |
|
23 |
-
| Test sets | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct |
|
24 |
|:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
|
25 |
| MMMU\_DEV\_VAL | 39.56 | 34.89 | 43.56 | 41.67 | **45.89** |
|
26 |
| MMStar | 41.6 | 50.2 | 51.87 | 47.8 | **54.4** |
|
|
|
10 |
|
11 |
# Introduction
|
12 |
|
13 |
+
The Aquila-VL-2B-llava-qwen model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
|
14 |
|
15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
16 |
|
|
|
20 |
|
21 |
We evaluated the model using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) tool. Whenever possible, we prioritized using the GPT-4 API for test sets that support API-based evaluation.
|
22 |
|
23 |
+
| Test sets | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct | Aquila-VL-2B |
|
24 |
|:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
|
25 |
| MMMU\_DEV\_VAL | 39.56 | 34.89 | 43.56 | 41.67 | **45.89** |
|
26 |
| MMStar | 41.6 | 50.2 | 51.87 | 47.8 | **54.4** |
|