StarCycle commited on
Commit
3e01fda
1 Parent(s): b204d98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -11,7 +11,9 @@ pipeline_tag: image-text-to-text
11
  ## Model
12
  llava-siglip-internlm2-1_8b-pretrain-v1 is a LLaVA checkpoint finetuned from [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) with [LLaVA-Pretrain](liuhaotian/LLaVA-Pretrain) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) by [Xtuner](https://github.com/InternLM/xtuner). The pretraining phase took 5.5 hours on 4 Nvidia GTX 4090 GPU (see this [intermediate checkpoint](https://huggingface.co/StarCycle/llava-siglip-internlm2-1_8b-pretrain-v1)). The finetuning phase took 16 hours on 4 Nvidia GTX 4090 GPU.
13
 
14
- The total size of the model is around 2.2B, which is suitable for embedded applications like robotics. This model performs slightly better than [llava-clip-internlm2-1_8b-v1](https://huggingface.co/StarCycle/llava-clip-internlm2-1_8b-v1).
 
 
15
 
16
  I have not carefully tune the hyperparameters during training. If you have any idea to improve it, please open an issue or just send an email to zhuohengli@foxmail.com. You are welcomed!
17
 
@@ -29,7 +31,13 @@ LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
29
  Bunny-3B | 69.2 | 68.6 | - | - | -
30
  MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4
31
  llava-clip-internlm2-1_8b-v1 | 63.3 | 63.1 | 63.6 | 61.7 | 35.3
32
- llava-siglip-internlm2-1_8b-v1 | - | 63.5 | - | 62.9 | 36.3
 
 
 
 
 
 
33
 
34
  ## Installation
35
  ```
 
11
  ## Model
12
  llava-siglip-internlm2-1_8b-pretrain-v1 is a LLaVA checkpoint finetuned from [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) with [LLaVA-Pretrain](liuhaotian/LLaVA-Pretrain) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) by [Xtuner](https://github.com/InternLM/xtuner). The pretraining phase took 5.5 hours on 4 Nvidia GTX 4090 GPU (see this [intermediate checkpoint](https://huggingface.co/StarCycle/llava-siglip-internlm2-1_8b-pretrain-v1)). The finetuning phase took 16 hours on 4 Nvidia GTX 4090 GPU.
13
 
14
+ The total size of the model is around 2.2B, which is suitable for embedded applications like robotics. This model performs slightly better than [llava-clip-internlm2-1_8b-v1](https://huggingface.co/StarCycle/llava-clip-internlm2-1_8b-v1).
15
+
16
+ #### By the way, it's also stronger than MiniCPM-V in the test split on MMBench.
17
 
18
  I have not carefully tune the hyperparameters during training. If you have any idea to improve it, please open an issue or just send an email to zhuohengli@foxmail.com. You are welcomed!
19
 
 
31
  Bunny-3B | 69.2 | 68.6 | - | - | -
32
  MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4
33
  llava-clip-internlm2-1_8b-v1 | 63.3 | 63.1 | 63.6 | 61.7 | 35.3
34
+ llava-siglip-internlm2-1_8b-v1 | 65.7 | 63.5 | 64.5 | 62.9 | 36.3
35
+
36
+ For the performance in MMBench Test EN:
37
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/BYxaG48KXrTXuSKgmoAnS.png)
38
+
39
+ For the performance in MMBench Test CN:
40
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/hGi4bpmEm3l1dJM557yAh.png)
41
 
42
  ## Installation
43
  ```