Zhang Hui commited on
Commit
83bb89e
1 Parent(s): 3fd04ab

update readme

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -9,7 +9,7 @@ license: apache-2.0
9
 
10
  A multimodal large-scale model, characterized by its open-source nature, closely emulates the functionalities of the GPT4V/Qwen-VL-Plus model. Built upon the foundation of Qwen-72b-Chat, CatVision in handling inputs that combine both images and text. This model is designed to effectively follow instructions for output formats, benefiting from the strengths of Qwen72b.
11
 
12
- 一个多模态的大规模模型,以其开源的特性为特点,紧密模拟了GPT4V/Qwen-VL-Plus模型的功能。该模型建立在Qwen-72b-Chat的基础上,CatVision可以处理包含交错的图像/文本输入。该模型旨在有效地遵循输出格式的指令,从Qwen72b的优势中受益。
13
 
14
  - Our training approach consisted of two stages, inspired by LLava1.5. In the initial stage, we trained the visual encoder + perceptual resampler, and in the second stage, we focused on training the large language model + perceptual resampler with instructional data. To overcome limited computational resources (32xA100-80G), we used Lora for training in both stages.
15
 
@@ -68,7 +68,7 @@ Our model achieved favorable results on the many leaderboards.
68
  | Gemini Pro | 47.9 | ---- |
69
  | Yi-VL-34B | 45.9 | 41.6 |
70
  | Qwen-VL-PLUS | 45.2 | 40.8 |
71
- | *CatVision* | 45.9 | 40.1 |
72
  | Macro-VL | 41.2 | 40.4 |
73
  | InfiMM-Zephyr-7B | 39.4 | 35.5 |
74
  | Yi-VL-6B | 39.1 | 37.8 |
@@ -83,7 +83,7 @@ Our model achieved favorable results on the many leaderboards.
83
  |--------------------------------|:---------:|:------------:|
84
  | GPT-4V(ision) (Playground) | 42.5 | 43.7 |
85
  | Qwen-VL-PLUS* | 39.5 | 36.8 |
86
- | *CatVision* | 39.6 | ---- |
87
  | Yi-VL-34B | 36.2 | 36.5 |
88
  | Yi-VL-6B | 35.8 | 35.0 |
89
  | Qwen-VL-7B-Chat | 30.7 | 31.3 |
@@ -104,7 +104,7 @@ Our model achieved favorable results on the many leaderboards.
104
  | Qwen-VL-PLUS(BASE) | 83.3 | 83.2 | 82.7 | 81.5 | 77.6 |
105
  | GPT4v | 77.0 | 75.1 | 74.4 | 75.0 | 46.5 |
106
  | Qwen-VL-PLUS | 67.0 | 66.2 | 70.7 | 69.6 | 55.1 |
107
- | *CatVision* | 70.9 | 71.8 | 70.2 | 71.6 | 49.8 |
108
  | Qwen-VL-Chat | 61.8 | 60.6 | 56.3 | 56.7 | 41.2 |
109
 
110
  - **[MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models)**
@@ -113,7 +113,7 @@ Our model achieved favorable results on the many leaderboards.
113
  |---------------|:----------:|:---------:|
114
  | GPT4v | 1409.43 | 517.14 |
115
  | Qwen-VL-PLUS | 1681.25 | 502.14 |
116
- | *CatVision* | 1560.90 | 366.43 |
117
  | Qwen-VL-Chat | 1487.57 | 360.71 |
118
 
119
  - **Open Compress**
 
9
 
10
  A multimodal large-scale model, characterized by its open-source nature, closely emulates the functionalities of the GPT4V/Qwen-VL-Plus model. Built upon the foundation of Qwen-72b-Chat, CatVision in handling inputs that combine both images and text. This model is designed to effectively follow instructions for output formats, benefiting from the strengths of Qwen72b.
11
 
12
+ 一个开源多模态大模型,紧密模拟了GPT4V/Qwen-VL系列模型的功能。该模型建立在Qwen-72b-Chat的基础上,CatVision可以处理包含交错的图像/文本输入。该模型旨在有效地遵循输出格式的指令,从Qwen72b的优势中受益。
13
 
14
  - Our training approach consisted of two stages, inspired by LLava1.5. In the initial stage, we trained the visual encoder + perceptual resampler, and in the second stage, we focused on training the large language model + perceptual resampler with instructional data. To overcome limited computational resources (32xA100-80G), we used Lora for training in both stages.
15
 
 
68
  | Gemini Pro | 47.9 | ---- |
69
  | Yi-VL-34B | 45.9 | 41.6 |
70
  | Qwen-VL-PLUS | 45.2 | 40.8 |
71
+ | **CatVision** | 45.9 | 40.1 |
72
  | Macro-VL | 41.2 | 40.4 |
73
  | InfiMM-Zephyr-7B | 39.4 | 35.5 |
74
  | Yi-VL-6B | 39.1 | 37.8 |
 
83
  |--------------------------------|:---------:|:------------:|
84
  | GPT-4V(ision) (Playground) | 42.5 | 43.7 |
85
  | Qwen-VL-PLUS* | 39.5 | 36.8 |
86
+ | **CatVision** | 39.6 | ---- |
87
  | Yi-VL-34B | 36.2 | 36.5 |
88
  | Yi-VL-6B | 35.8 | 35.0 |
89
  | Qwen-VL-7B-Chat | 30.7 | 31.3 |
 
104
  | Qwen-VL-PLUS(BASE) | 83.3 | 83.2 | 82.7 | 81.5 | 77.6 |
105
  | GPT4v | 77.0 | 75.1 | 74.4 | 75.0 | 46.5 |
106
  | Qwen-VL-PLUS | 67.0 | 66.2 | 70.7 | 69.6 | 55.1 |
107
+ | **CatVision** | 70.9 | 71.8 | 70.2 | 71.6 | 49.8 |
108
  | Qwen-VL-Chat | 61.8 | 60.6 | 56.3 | 56.7 | 41.2 |
109
 
110
  - **[MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models)**
 
113
  |---------------|:----------:|:---------:|
114
  | GPT4v | 1409.43 | 517.14 |
115
  | Qwen-VL-PLUS | 1681.25 | 502.14 |
116
+ | **CatVision** | 1560.90 | 366.43 |
117
  | Qwen-VL-Chat | 1487.57 | 360.71 |
118
 
119
  - **Open Compress**