opengvlab-admin commited on
Commit
5e72178
1 Parent(s): 95d07f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -10,7 +10,7 @@ datasets:
10
  pipeline_tag: visual-question-answering
11
  ---
12
 
13
- # Model Card for Mini-InternVL-Chat-2B-V1-5
14
  <p align="center">
15
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/D60YzQBIzvoCvLRp2gZ0A.jpeg" alt="Image Description" width="300" height="300" />
16
  </p>
@@ -33,12 +33,12 @@ As shown in the figure below, we adopted the same model architecture as InternVL
33
  ## Model Details
34
  - **Model Type:** multimodal large language model (MLLM)
35
  - **Model Stats:**
36
- - Architecture: [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) + MLP + [InternLM2-Chat-1.8B](https://huggingface.co/internlm/internlm2-chat-1_8b)
37
  - Image size: dynamic resolution, max to 40 tiles of 448 x 448 (4K resolution).
38
- - Params: 2.2B
39
 
40
  - **Training Strategy:**
41
- - Learnable component in the pretraining stage: ViT + MLP
42
  - Learnable component in the finetuning stage: ViT + MLP + LLM
43
  - For more details on training hyperparameters, take a look at our code: [pretrain]() | [finetune]()
44
 
@@ -57,7 +57,7 @@ As shown in the figure below, we adopted the same model architecture as InternVL
57
 
58
  ## Model Usage
59
 
60
- We provide an example code to run Mini-InternVL-Chat-2B-V1.5 using `transformers`.
61
 
62
  You can also use our [online demo](https://internvl.opengvlab.com/) to get a quick experience of this model.
63
 
 
10
  pipeline_tag: visual-question-answering
11
  ---
12
 
13
+ # Model Card for Mini-InternVL-Chat-4B-V1-5
14
  <p align="center">
15
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/D60YzQBIzvoCvLRp2gZ0A.jpeg" alt="Image Description" width="300" height="300" />
16
  </p>
 
33
  ## Model Details
34
  - **Model Type:** multimodal large language model (MLLM)
35
  - **Model Stats:**
36
+ - Architecture: [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px) + MLP + [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
37
  - Image size: dynamic resolution, max to 40 tiles of 448 x 448 (4K resolution).
38
+ - Params: 4.2B
39
 
40
  - **Training Strategy:**
41
+ - Learnable component in the pretraining stage: MLP
42
  - Learnable component in the finetuning stage: ViT + MLP + LLM
43
  - For more details on training hyperparameters, take a look at our code: [pretrain]() | [finetune]()
44
 
 
57
 
58
  ## Model Usage
59
 
60
+ We provide an example code to run Mini-InternVL-Chat-4B-V1.5 using `transformers`.
61
 
62
  You can also use our [online demo](https://internvl.opengvlab.com/) to get a quick experience of this model.
63