Vision-CAIR commited on
Commit
70fcd58
β€’
1 Parent(s): e81e82a

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -182
README.md DELETED
@@ -1,182 +0,0 @@
1
- # MiniGPT-V
2
-
3
- <font size='5'>**MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning**</font>
4
-
5
- Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong☨, Mohamed Elhoseiny☨
6
-
7
- ☨equal last author
8
-
9
- <a href='https://minigpt-v2.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPTv2.pdf'><img src='https://img.shields.io/badge/Paper-PDF-red'></a> <a href='https://minigpt-v2.github.io'><img src='https://img.shields.io/badge/Gradio-Demo-blue'></a> [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=atFCwV2hSY4)
10
-
11
-
12
- <font size='5'>**MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models**</font>
13
-
14
- Deyao Zhu*, Jun Chen*, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
15
-
16
- *equal contribution
17
-
18
- <a href='https://minigpt-4.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2304.10592'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/spaces/Vision-CAIR/minigpt4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href='https://huggingface.co/Vision-CAIR/MiniGPT-4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be)
19
-
20
- *King Abdullah University of Science and Technology*
21
-
22
- ## πŸ’‘ Get help - [Q&A](https://github.com/Vision-CAIR/MiniGPT-4/discussions/categories/q-a) or [Discord πŸ’¬](https://discord.gg/5WdJkjbAeE)
23
-
24
-
25
- ## News
26
- [Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2
27
-
28
- [Aug.28 2023] We now provide a llama 2 version of MiniGPT-4
29
-
30
- ## Online Demo
31
-
32
- Click the image to chat with MiniGPT-v2 around your images
33
- [![demo](figs/minigpt2_demo.png)](https://minigpt-v2.github.io/)
34
-
35
- Click the image to chat with MiniGPT-4 around your images
36
- [![demo](figs/online_demo.png)](https://minigpt-4.github.io)
37
-
38
-
39
- ## MiniGPT-v2 Examples
40
-
41
- ![MiniGPT-v2 demos](figs/demo.png)
42
-
43
-
44
-
45
- ## MiniGPT-4 Examples
46
- | | |
47
- :-------------------------:|:-------------------------:
48
- ![find wild](figs/examples/wop_2.png) | ![write story](figs/examples/ad_2.png)
49
- ![solve problem](figs/examples/fix_1.png) | ![write Poem](figs/examples/rhyme_1.png)
50
-
51
- More examples can be found in the [project page](https://minigpt-4.github.io).
52
-
53
-
54
-
55
- ## Getting Started
56
- ### Installation
57
-
58
- **1. Prepare the code and the environment**
59
-
60
- Git clone our repository, creating a python environment and activate it via the following command
61
-
62
- ```bash
63
- git clone https://github.com/Vision-CAIR/MiniGPT-4.git
64
- cd MiniGPT-4
65
- conda env create -f environment.yml
66
- conda activate minigpt4
67
- ```
68
-
69
-
70
- **2. Prepare the pretrained LLM weights**
71
-
72
- **MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4**, we have both Vicuna V0 and Llama 2 version.
73
- Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
74
-
75
- | Llama 2 Chat 7B | Vicuna V0 13B | Vicuna V0 7B |
76
- :------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:
77
- [Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main)
78
-
79
-
80
- Then, set the variable *llama_model* in the model config file to the LLM weight path.
81
-
82
- * For MiniGPT-v2, set the LLM path
83
- [here](minigpt4/configs/models/minigpt_v2.yaml#L15) at Line 14.
84
-
85
- * For MiniGPT-4 (Llama2), set the LLM path
86
- [here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15.
87
-
88
- * For MiniGPT-4 (Vicuna), set the LLM path
89
- [here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18
90
-
91
- **3. Prepare the pretrained model checkpoints**
92
-
93
- Download the pretrained model checkpoints
94
-
95
-
96
- | MiniGPT-v2 (LLaMA-2 Chat 7B) |
97
- |------------------------------|
98
- | [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) |
99
-
100
- For **MiniGPT-v2**, set the path to the pretrained checkpoint in the evaluation config file
101
- in [eval_configs/minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#L10) at Line 8.
102
-
103
-
104
-
105
- | MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) |
106
- |----------------------------|---------------------------|---------------------------------|
107
- | [Download](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) | [Download](https://drive.google.com/file/d/11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk/view?usp=sharing) |
108
-
109
- For **MiniGPT-4**, set the path to the pretrained checkpoint in the evaluation config file
110
- in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 8 for Vicuna version or [eval_configs/minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#L10) for LLama2 version.
111
-
112
-
113
-
114
- ### Launching Demo Locally
115
-
116
- For MiniGPT-v2, run
117
- ```
118
- python demo_v2.py --cfg-path eval_configs/minigpt4v2_eval.yaml --gpu-id 0
119
- ```
120
-
121
- For MiniGPT-4 (Vicuna version), run
122
-
123
- ```
124
- python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
125
- ```
126
-
127
- For MiniGPT-4 (Llama2 version), run
128
-
129
- ```
130
- python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0
131
- ```
132
-
133
-
134
- To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
135
- This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
136
- For more powerful GPUs, you can run the model
137
- in 16 bit by setting `low_resource` to `False` in the relevant config file:
138
-
139
- * MiniGPT-v2: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6)
140
- * MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6)
141
- * MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6)
142
-
143
- Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing)
144
-
145
-
146
- ### Training
147
- For training details of MiniGPT-4, check [here](MiniGPT4_Train.md).
148
-
149
-
150
-
151
-
152
- ## Acknowledgement
153
-
154
- + [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
155
- + [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis!
156
- + [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
157
- + [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model.
158
-
159
-
160
- If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX:
161
- ```bibtex
162
-
163
- @article{Chen2023minigpt,
164
- title={MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning},
165
- author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},
166
- journal={github},
167
- year={2023}
168
- }
169
-
170
- @article{zhu2023minigpt,
171
- title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
172
- author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
173
- journal={arXiv preprint arXiv:2304.10592},
174
- year={2023}
175
- }
176
- ```
177
-
178
-
179
- ## License
180
- This repository is under [BSD 3-Clause License](LICENSE.md).
181
- Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with
182
- BSD 3-Clause License [here](LICENSE_Lavis.md).