Zhang Hui commited on
Commit
9a424a3
1 Parent(s): 609b6ac

add more results

Browse files
Files changed (3) hide show
  1. .gitignore +1 -0
  2. MMMU.png +0 -3
  3. README.md +45 -6
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ results/
MMMU.png DELETED

Git LFS Details

  • SHA256: 6992e642aaad7cb1a3aad6993760dac6984905b99b1f35478ddf46d4d89d2a3f
  • Pointer size: 131 Bytes
  • Size of remote file: 147 kB
README.md CHANGED
@@ -24,10 +24,13 @@ A multimodal large-scale model, characterized by its open-source nature, closely
24
 
25
  视觉编码部分继承自Qwen-VL-Chat,即Openclip ViT-bigG。
26
 
 
 
 
 
27
  ## Quick Start
28
 
29
  ```
30
-
31
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
32
  tokenizer = AutoTokenizer.from_pretrained(
33
  pretrained_model_name_or_path="huizhang0110/CatVision",
@@ -55,19 +58,33 @@ response, history = model.chat(
55
 
56
  ## Benchmark
57
 
58
- Our model achieved favorable results on the [MMMU](https://eval.ai/web/challenges/challenge-page/2179/leaderboard/5377) and [CMMMU]() leaderboards.
59
 
60
  - **[MMMU](https://eval.ai/web/challenges/challenge-page/2179/leaderboard/5377)**
61
 
62
- ![MMMU](./MMMU.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  - **[CMMMU](https://github.com/CMMMU-Benchmark/CMMMU/blob/main/README.md)**
65
 
66
  | Model | Val (900) | Test (11K) |
67
  |--------------------------------|:---------:|:------------:|
68
- | GPT-4V(ision) (Playground) | **42.5** | **43.7** |
69
  | Qwen-VL-PLUS* | 39.5 | 36.8 |
70
- | CatVision | 39.6 | ---- |
71
  | Yi-VL-34B | 36.2 | 36.5 |
72
  | Yi-VL-6B | 35.8 | 35.0 |
73
  | Qwen-VL-7B-Chat | 30.7 | 31.3 |
@@ -81,6 +98,28 @@ Our model achieved favorable results on the [MMMU](https://eval.ai/web/challenge
81
  | Frequent Choice | 24.1 | 26.0 |
82
  | Random Choice | 21.6 | 21.6 |
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  - **Show Case**
86
 
@@ -101,7 +140,7 @@ Our model achieved favorable results on the [MMMU](https://eval.ai/web/challenge
101
  ```
102
  @misc{CatVision,
103
  author = {zhanghui@4paradigm.com},
104
- title = {Open Qwen-VL-Plus},
105
  year = {2024},
106
  publisher = {huggingface},
107
  howpublished = {\url{https://huggingface.co/huizhang0110/CatVision}}
 
24
 
25
  视觉编码部分继承自Qwen-VL-Chat,即Openclip ViT-bigG。
26
 
27
+ - We are continuously collecting instruction data, optimizing the model, and looking forward to supporting more tasks.
28
+
29
+ 我们正在持续收集指令数据,优化模型,期待能支持更多的功能。
30
+
31
  ## Quick Start
32
 
33
  ```
 
34
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
35
  tokenizer = AutoTokenizer.from_pretrained(
36
  pretrained_model_name_or_path="huizhang0110/CatVision",
 
58
 
59
  ## Benchmark
60
 
61
+ Our model achieved favorable results on the many leaderboards.
62
 
63
  - **[MMMU](https://eval.ai/web/challenges/challenge-page/2179/leaderboard/5377)**
64
 
65
+ | Model | Val (900) | Test (11K) |
66
+ |--------------------------------|:---------:|:------------:|
67
+ | Gemini Ultra | 59.4 | ---- |
68
+ | GPT4V | 56.8 | 55.7 |
69
+ | Gemini Pro | 47.9 | ---- |
70
+ | Yi-VL-34B | 45.9 | 41.6 |
71
+ | Qwen-VL-PLUS | 45.2 | 40.8 |
72
+ | *CatVision* | 45.9 | 40.1 |
73
+ | Macro-VL | 41.2 | 40.4 |
74
+ | InfiMM-Zephyr-7B | 39.4 | 35.5 |
75
+ | Yi-VL-6B | 39.1 | 37.8 |
76
+ | SVIT | 38.0 | 34.1 |
77
+ | LLaVA-1.5-13B | 36.4 | 33.6 |
78
+ | Emu2-Chat | 36.3 | 34.1 |
79
+ | Qwen-VL-7B-Chat | 35.9 | 32.9 |
80
 
81
  - **[CMMMU](https://github.com/CMMMU-Benchmark/CMMMU/blob/main/README.md)**
82
 
83
  | Model | Val (900) | Test (11K) |
84
  |--------------------------------|:---------:|:------------:|
85
+ | GPT-4V(ision) (Playground) | 42.5 | 43.7 |
86
  | Qwen-VL-PLUS* | 39.5 | 36.8 |
87
+ | *CatVision* | 39.6 | ---- |
88
  | Yi-VL-34B | 36.2 | 36.5 |
89
  | Yi-VL-6B | 35.8 | 35.0 |
90
  | Qwen-VL-7B-Chat | 30.7 | 31.3 |
 
98
  | Frequent Choice | 24.1 | 26.0 |
99
  | Random Choice | 21.6 | 21.6 |
100
 
101
+ - **[MMBench](https://mmbench.opencompass.org.cn/leaderboard)**
102
+
103
+ | Model | mmbench_cn (test) | mmbench_cn (dev) | mmbench_en (test) | mmbench_zh (dev) | ccbench |
104
+ |---------------------|:-----------------:|:----------------:|:-----------------:|:----------------:|:-------:|
105
+ | Qwen-VL-PLUS(BASE) | 83.3 | 83.2 | 82.7 | 81.5 | 77.6 |
106
+ | GPT4v | 77.0 | 75.1 | 74.4 | 75.0 | 46.5 |
107
+ | Qwen-VL-PLUS | 67.0 | 66.2 | 70.7 | 69.6 | 55.1 |
108
+ | *CatVision* | 70.9 | 71.8 | 70.2 | 71.6 | 49.8 |
109
+ | Qwen-VL-Chat | 61.8 | 60.6 | 56.3 | 56.7 | 41.2 |
110
+
111
+ - **[MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models)**
112
+
113
+ | Model | Perception | Cognition |
114
+ |---------------|:----------:|:---------:|
115
+ | GPT4v | 1409.43 | 517.14 |
116
+ | Qwen-VL-PLUS | 1681.25 | 502.14 |
117
+ | *CatVision* | 1560.90 | 366.43 |
118
+ | Qwen-VL-Chat | 1487.57 | 360.71 |
119
+
120
+ - **Open Compress**
121
+
122
+ wait
123
 
124
  - **Show Case**
125
 
 
140
  ```
141
  @misc{CatVision,
142
  author = {zhanghui@4paradigm.com},
143
+ title = {CatVision},
144
  year = {2024},
145
  publisher = {huggingface},
146
  howpublished = {\url{https://huggingface.co/huizhang0110/CatVision}}