czczup commited on
Commit
5268958
Β·
verified Β·
1 Parent(s): bbccbf7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -22
README.md CHANGED
@@ -10,10 +10,7 @@ datasets:
10
  pipeline_tag: visual-question-answering
11
  ---
12
 
13
- # Model Card for InternVL-Chat-V1-2-Plus
14
- <p align="center">
15
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/X8AXMkOlKeUpNcoJIXKna.webp" alt="Image Description" width="300" height="300">
16
- </p>
17
 
18
  [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
19
 
@@ -40,18 +37,6 @@ InternVL-Chat-V1-2-Plus uses the same model architecture as [InternVL-Chat-V1-2]
40
  - Learnable Component: ViT + MLP + LLM
41
  - Data: 12 million SFT samples.
42
 
43
- ## Released Models
44
-
45
- | Model | Vision Foundation Model | Release Date |Note |
46
- | :---------------------------------------------------------:|:--------------------------------------------------------------------------: |:----------------------:| :---------------------------------- |
47
- | InternVL-Chat-V1-5(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)) | InternViT-6B-448px-V1-5(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)) |2024.04.18 | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (πŸ”₯new)|
48
- | InternVL-Chat-V1-2-Plus(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) |InternViT-6B-448px-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) |2024.02.21 | more SFT data and stronger |
49
- | InternVL-Chat-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) ) |InternViT-6B-448px-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) |2024.02.11 | scaling up LLM to 34B |
50
- | InternVL-Chat-V1-1(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)) |InternViT-6B-448px-V1-0(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0)) |2024.01.24 | support Chinese and stronger OCR |
51
-
52
-
53
-
54
-
55
  ## Performance
56
 
57
  \* Proprietary Model &nbsp;&nbsp;&nbsp;&nbsp; † Training Set Observed
@@ -153,9 +138,3 @@ If you find this project useful in your research, please consider citing:
153
  ## License
154
 
155
  This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.
156
-
157
- Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
158
-
159
- ## Acknowledgement
160
-
161
- InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
 
10
  pipeline_tag: visual-question-answering
11
  ---
12
 
13
+ # InternVL-Chat-V1-2-Plus
 
 
 
14
 
15
  [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
16
 
 
37
  - Learnable Component: ViT + MLP + LLM
38
  - Data: 12 million SFT samples.
39
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## Performance
41
 
42
  \* Proprietary Model &nbsp;&nbsp;&nbsp;&nbsp; † Training Set Observed
 
138
  ## License
139
 
140
  This project is released under the MIT license. Parts of this project contain code and models (e.g., LLaMA2) from other sources, which are subject to their respective licenses.