Update README.md
Browse files
README.md
CHANGED
@@ -24,10 +24,10 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
|
|
24 |
## Model Details
|
25 |
- **Model Type:** vision foundation model, feature backbone
|
26 |
- **Model Stats:**
|
27 |
-
- Params (M):
|
28 |
- Image size: 448 x 448
|
29 |
- **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
|
30 |
-
|
31 |
## Model Usage (Image Embeddings)
|
32 |
|
33 |
```python
|
|
|
24 |
## Model Details
|
25 |
- **Model Type:** vision foundation model, feature backbone
|
26 |
- **Model Stats:**
|
27 |
+
- Params (M): 5540 (the last 3 blocks are discarded)
|
28 |
- Image size: 448 x 448
|
29 |
- **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
|
30 |
+
- **Note!!:** InternViT-6B originally had 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. For ease of use and to save GPU memory, we simply discarded the last 3 blocks. Now the model has only 45 blocks and the number of parameters has been reduced from 5.9B to 5.5B. Please set `mm_vision_select_layer=-1` when using this model.
|
31 |
## Model Usage (Image Embeddings)
|
32 |
|
33 |
```python
|