Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,9 @@ pipeline_tag: image-text-to-text
|
|
11 |
## Model
|
12 |
llava-siglip-internlm2-1_8b-pretrain-v1 is a LLaVA checkpoint finetuned from [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) with [LLaVA-Pretrain](liuhaotian/LLaVA-Pretrain) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) by [Xtuner](https://github.com/InternLM/xtuner). The pretraining phase took 5.5 hours on 4 Nvidia GTX 4090 GPU (see this [intermediate checkpoint](https://huggingface.co/StarCycle/llava-siglip-internlm2-1_8b-pretrain-v1)). The finetuning phase took 16 hours on 4 Nvidia GTX 4090 GPU.
|
13 |
|
14 |
-
The total size of the model is around 2.2B, which is suitable for embedded applications like robotics. This model performs slightly better than [llava-clip-internlm2-1_8b-v1](https://huggingface.co/StarCycle/llava-clip-internlm2-1_8b-v1).
|
|
|
|
|
15 |
|
16 |
I have not carefully tune the hyperparameters during training. If you have any idea to improve it, please open an issue or just send an email to zhuohengli@foxmail.com. You are welcomed!
|
17 |
|
@@ -29,7 +31,13 @@ LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
|
|
29 |
Bunny-3B | 69.2 | 68.6 | - | - | -
|
30 |
MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4
|
31 |
llava-clip-internlm2-1_8b-v1 | 63.3 | 63.1 | 63.6 | 61.7 | 35.3
|
32 |
-
llava-siglip-internlm2-1_8b-v1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
## Installation
|
35 |
```
|
|
|
11 |
## Model
|
12 |
llava-siglip-internlm2-1_8b-pretrain-v1 is a LLaVA checkpoint finetuned from [internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b) and [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) with [LLaVA-Pretrain](liuhaotian/LLaVA-Pretrain) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) by [Xtuner](https://github.com/InternLM/xtuner). The pretraining phase took 5.5 hours on 4 Nvidia GTX 4090 GPU (see this [intermediate checkpoint](https://huggingface.co/StarCycle/llava-siglip-internlm2-1_8b-pretrain-v1)). The finetuning phase took 16 hours on 4 Nvidia GTX 4090 GPU.
|
13 |
|
14 |
+
The total size of the model is around 2.2B, which is suitable for embedded applications like robotics. This model performs slightly better than [llava-clip-internlm2-1_8b-v1](https://huggingface.co/StarCycle/llava-clip-internlm2-1_8b-v1).
|
15 |
+
|
16 |
+
#### By the way, it's also stronger than MiniCPM-V in the test split on MMBench.
|
17 |
|
18 |
I have not carefully tune the hyperparameters during training. If you have any idea to improve it, please open an issue or just send an email to zhuohengli@foxmail.com. You are welcomed!
|
19 |
|
|
|
31 |
Bunny-3B | 69.2 | 68.6 | - | - | -
|
32 |
MiniCPM-V | 64.1 | 67.9 | 62.6 | 65.3 | 41.4
|
33 |
llava-clip-internlm2-1_8b-v1 | 63.3 | 63.1 | 63.6 | 61.7 | 35.3
|
34 |
+
llava-siglip-internlm2-1_8b-v1 | 65.7 | 63.5 | 64.5 | 62.9 | 36.3
|
35 |
+
|
36 |
+
For the performance in MMBench Test EN:
|
37 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/BYxaG48KXrTXuSKgmoAnS.png)
|
38 |
+
|
39 |
+
For the performance in MMBench Test CN:
|
40 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/642a298ae5f33939cf3ee600/hGi4bpmEm3l1dJM557yAh.png)
|
41 |
|
42 |
## Installation
|
43 |
```
|