zwgao commited on
Commit
c5d3ba9
1 Parent(s): ed13a99

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -2
README.md CHANGED
@@ -11,11 +11,20 @@ pipeline_tag: image-feature-extraction
11
  ---
12
 
13
  # Model Card for InternViT-6B-448px-V1-5
14
-
15
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
 
16
 
17
  \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
18
 
 
 
 
 
 
 
 
 
19
  | Model | Date | Download | Note |
20
  | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
21
  | InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
@@ -24,6 +33,14 @@ pipeline_tag: image-feature-extraction
24
  | InternViT-6B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
25
  | InternVL-14B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
26
 
 
 
 
 
 
 
 
 
27
 
28
  ## Model Details
29
  - **Model Type:** vision foundation model, feature backbone
 
11
  ---
12
 
13
  # Model Card for InternViT-6B-448px-V1-5
14
+ <p align="center">
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
16
+ </p>
17
 
18
  \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
19
 
20
+ We develop InternViT-6B-448px-V1-5 by continuing the pre-training of the strong foundation of InternViT-6B-448px-V1.2. In this update, the resolution of training images is expanded from 448&times;448 to dynamic 448&times;448, where the basic tile size is 448&times;448 and the number of tiles ranges from 1 to 12.
21
+ Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
22
+ 1.5 version model.
23
+
24
+ ## Released Models
25
+
26
+ ### Vision Foundation model
27
+
28
  | Model | Date | Download | Note |
29
  | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
30
  | InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
 
33
  | InternViT-6B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
34
  | InternVL-14B-224px | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
35
 
36
+ ### Multimodal Large Language Model (MLLM)
37
+
38
+ | Model | Date | Download | Note |
39
+ | ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
40
+ | InternVL-Chat-V1.5 | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
41
+ | InternVL-Chat-V1.2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) | more SFT data and stronger |
42
+ | InternVL-Chat-V1.2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) | scaling up LLM to 34B |
43
+ | InternVL-Chat-V1.1 | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1) | support Chinese and stronger OCR |
44
 
45
  ## Model Details
46
  - **Model Type:** vision foundation model, feature backbone