BAAI
/

ldwang commited on
Commit
3051d9b
1 Parent(s): 4012699

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
10
 
11
  # Introduction
12
 
13
- The Infinity-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
14
 
15
  The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
16
 
@@ -20,7 +20,7 @@ We plan to open-source the Infinity-MM dataset, training scripts, and related re
20
 
21
  We evaluated the model using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) tool. Whenever possible, we prioritized using the GPT-4 API for test sets that support API-based evaluation.
22
 
23
- | Test sets | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct | Infinity-VL-2B |
24
  |:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
25
  | MMMU\_DEV\_VAL | 39.56 | 34.89 | 43.56 | 41.67 | **45.89** |
26
  | MMStar | 41.6 | 50.2 | 51.87 | 47.8 | **54.4** |
 
10
 
11
  # Introduction
12
 
13
+ The Aquila-VL-2B-llava-qwen model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
14
 
15
  The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
16
 
 
20
 
21
  We evaluated the model using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) tool. Whenever possible, we prioritized using the GPT-4 API for test sets that support API-based evaluation.
22
 
23
+ | Test sets | MiniCPM-V-2 | InternVL2-2B | XinYuan-VL-2B | Qwen2-VL-2B-Instruct | Aquila-VL-2B |
24
  |:----------------:|:--------------:|:---------------:|:----------------:|:-----------------------:|:-----------------:|
25
  | MMMU\_DEV\_VAL | 39.56 | 34.89 | 43.56 | 41.67 | **45.89** |
26
  | MMStar | 41.6 | 50.2 | 51.87 | 47.8 | **54.4** |