Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ datasets:
|
|
10 |
pipeline_tag: image-feature-extraction
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
<p align="center">
|
16 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/jSJ7TChEGvGP_gwNhrYoA.webp" alt="Image Description" width="300" height="300">
|
@@ -20,15 +20,6 @@ pipeline_tag: image-feature-extraction
|
|
20 |
|
21 |
[\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#model-usage) [\[π Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[π δΈζ解读\]](https://zhuanlan.zhihu.com/p/675877376)
|
22 |
|
23 |
-
| Model | Date | Download | Note |
|
24 |
-
| ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
|
25 |
-
| InternViT-6B-448px-V1-5 | 2024.04.20 | π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (π₯new) |
|
26 |
-
| InternViT-6B-448px-V1-2 | 2024.02.11 | π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution |
|
27 |
-
| InternViT-6B-448px-V1-0 | 2024.01.30 | π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution |
|
28 |
-
| InternViT-6B-224px | 2023.12.22 | π€ [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
|
29 |
-
| InternVL-14B-224px | 2023.12.22 | π€ [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
|
30 |
-
|
31 |
-
|
32 |
## Model Details
|
33 |
- **Model Type:** vision foundation model, feature backbone
|
34 |
- **Model Stats:**
|
@@ -37,7 +28,6 @@ pipeline_tag: image-feature-extraction
|
|
37 |
- **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
|
38 |
- **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, when building a VLLM with this model, **please use the features from the fourth-to-last layer.**
|
39 |
|
40 |
-
|
41 |
## Linear Probing Performance
|
42 |
|
43 |
See this [document](https://github.com/OpenGVLab/InternVL/tree/main/classification#-evaluation) for more details about the linear probing evaluation.
|
@@ -88,7 +78,6 @@ If you find this project useful in your research, please consider citing:
|
|
88 |
}
|
89 |
```
|
90 |
|
91 |
-
|
92 |
## Acknowledgement
|
93 |
|
94 |
InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
|
|
|
10 |
pipeline_tag: image-feature-extraction
|
11 |
---
|
12 |
|
13 |
+
# InternViT-6B-224px
|
14 |
|
15 |
<p align="center">
|
16 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/jSJ7TChEGvGP_gwNhrYoA.webp" alt="Image Description" width="300" height="300">
|
|
|
20 |
|
21 |
[\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#model-usage) [\[π Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[π δΈζ解读\]](https://zhuanlan.zhihu.com/p/675877376)
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
## Model Details
|
24 |
- **Model Type:** vision foundation model, feature backbone
|
25 |
- **Model Stats:**
|
|
|
28 |
- **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
|
29 |
- **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, when building a VLLM with this model, **please use the features from the fourth-to-last layer.**
|
30 |
|
|
|
31 |
## Linear Probing Performance
|
32 |
|
33 |
See this [document](https://github.com/OpenGVLab/InternVL/tree/main/classification#-evaluation) for more details about the linear probing evaluation.
|
|
|
78 |
}
|
79 |
```
|
80 |
|
|
|
81 |
## Acknowledgement
|
82 |
|
83 |
InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
|