OpenGVLab
/

InternVL-Chat-V1-2

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Metrics Training metrics Community

czczup commited on Feb 11

Commit

0f18ed3

•

1 Parent(s): c9c65b9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
     - Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
   - SFT Stage
     - Learnable Component: ViT + MLP + LLM
-    - Data: A simplified, fully open-source dataset, containing approximately 1 million entries.
 ## Model Usage

     - Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
   - SFT Stage
     - Learnable Component: ViT + MLP + LLM
+    - Data: A simplified, fully open-source dataset, containing approximately 1 million samples.
 ## Model Usage