czczup commited on
Commit
e5e4b98
1 Parent(s): 14be858

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -1
README.md CHANGED
@@ -1,3 +1,80 @@
1
  ---
2
- license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ inference: false
3
  ---
4
+
5
+ <br>
6
+ <br>
7
+
8
+ # InternVL-Chat Model Card
9
+
10
+ ## What is InternVL?
11
+
12
+ \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\]
13
+
14
+ InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
15
+
16
+ It is trained using web-scale, noisy image-text pairs. The data are all publicly available and comprise multilingual content, including LAION-en, LAION-multi, LAION-COCO, COYO, Wukong, CC12M, CC3M, and SBU.
17
+
18
+ It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
19
+
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/k5UATwX5W2b5KJBN5C58x.png)
21
+
22
+ ## How to Run?
23
+
24
+ Please refer to this [README](https://github.com/OpenGVLab/InternVL/tree/main/llava#internvl-for-multimodal-dialogue-using-llava) to run this model.
25
+
26
+ Note: We have retained the original documentation of LLaVA 1.5 as a more detailed manual. In most cases, you will only need to refer to the new documentation that we have added.
27
+
28
+ ## Model details
29
+
30
+ **Model type:**
31
+ InternVL-Chat is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.
32
+ It is an auto-regressive language model, based on the transformer architecture.
33
+
34
+ **Model date:**
35
+ InternVL-Chat-ViT-6B-Vicuna-13B-448px was trained in January 2024.
36
+
37
+ **Paper or resources for more information:**
38
+ https://github.com/OpenGVLab/InternVL
39
+
40
+ ## License
41
+ InternVL is released under the MIT license.
42
+
43
+ Llama 2 is licensed under the LLAMA 2 Community License,
44
+ Copyright (c) Meta Platforms, Inc. All Rights Reserved.
45
+
46
+ **Where to send questions or comments about the model:**
47
+ https://github.com/OpenGVLab/InternVL/issues
48
+
49
+ ## Intended use
50
+ **Primary intended uses:**
51
+ The primary use of InternVL-Chat is research on large multimodal models and chatbots.
52
+
53
+ **Primary intended users:**
54
+ The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
55
+
56
+ ## Training dataset
57
+ - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
58
+ - 158K GPT-generated multimodal instruction-following data.
59
+ - 450K academic-task-oriented VQA data mixture.
60
+ - 40K ShareGPT data.
61
+
62
+ ## Evaluation dataset
63
+ A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.
64
+
65
+ ## Acknowledgement
66
+
67
+ This model card is adapted from [LLaVA's model card](https://huggingface.co/liuhaotian/llava-v1.5-13b). Thanks for their awesome work!
68
+
69
+ ## Citation
70
+
71
+ If you find this project useful in your research, please consider cite:
72
+
73
+ ```BibTeX
74
+ @article{chen2023internvl,
75
+ title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
76
+ author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
77
+ journal={arXiv preprint arXiv:2312.14238},
78
+ year={2023}
79
+ }
80
+ ```