czczup commited on
Commit
b4a04b2
1 Parent(s): b1a30c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -24,10 +24,10 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
24
  ## Model Details
25
  - **Model Type:** vision foundation model, feature backbone
26
  - **Model Stats:**
27
- - Params (M): 5903
28
  - Image size: 448 x 448
29
  - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
30
-
31
  ## Model Usage (Image Embeddings)
32
 
33
  ```python
 
24
  ## Model Details
25
  - **Model Type:** vision foundation model, feature backbone
26
  - **Model Stats:**
27
+ - Params (M): 5540 (the last 3 blocks are discarded)
28
  - Image size: 448 x 448
29
  - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
30
+ - **Note!!:** InternViT-6B originally had 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. For ease of use and to save GPU memory, we simply discarded the last 3 blocks. Now the model has only 45 blocks and the number of parameters has been reduced from 5.9B to 5.5B. Please set `mm_vision_select_layer=-1` when using this model.
31
  ## Model Usage (Image Embeddings)
32
 
33
  ```python