czczup commited on
Commit
cb1263c
β€’
1 Parent(s): 43c60b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -15,7 +15,9 @@ pipeline_tag: image-feature-extraction
15
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
16
  </p>
17
 
18
- \[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\] \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
 
 
19
 
20
  We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of [InternViT-6B-448px-V1.2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). In this update, the resolution of training images is expanded from 448&times;448 to dynamic 448&times;448, where the basic tile size is 448&times;448 and the number of tiles ranges from 1 to 12.
21
  Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
@@ -82,6 +84,12 @@ If you find this project useful in your research, please consider citing:
82
  journal={arXiv preprint arXiv:2312.14238},
83
  year={2023}
84
  }
 
 
 
 
 
 
85
  ```
86
 
87
 
 
15
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
16
  </p>
17
 
18
+ [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
19
+
20
+ [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#model-usage) [\[🌐 Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/675877376)
21
 
22
  We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of [InternViT-6B-448px-V1.2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). In this update, the resolution of training images is expanded from 448&times;448 to dynamic 448&times;448, where the basic tile size is 448&times;448 and the number of tiles ranges from 1 to 12.
23
  Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
 
84
  journal={arXiv preprint arXiv:2312.14238},
85
  year={2023}
86
  }
87
+ @article{chen2024far,
88
+ title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
89
+ author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
90
+ journal={arXiv preprint arXiv:2404.16821},
91
+ year={2024}
92
+ }
93
  ```
94
 
95