OpenGVLab
/

InternViT-6B-448px-V1-5

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

zwgao commited on Apr 21, 2024

Commit

688b3d7

·

verified ·

1 Parent(s): 37685b7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ pipeline_tag: image-feature-extraction
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
-We develop InternViT-6B-448px-V1-5 by continuing the pre-training of the strong foundation of InternViT-6B-448px-V1.2. In this update, the resolution of training images is expanded from 448&times;448 to dynamic 448&times;448, where the basic tile size is 448&times;448 and the number of tiles ranges from 1 to 12.
 Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
 1.5 version model.

 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
+We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of InternViT-6B-448px-V1.2. In this update, the resolution of training images is expanded from 448&times;448 to dynamic 448&times;448, where the basic tile size is 448&times;448 and the number of tiles ranges from 1 to 12.
 Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
 1.5 version model.