Spaces:
Runtime error
Runtime error
# MiniGPT-V | |
<font size='5'>**MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning**</font> | |
Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiongβ¨, Mohamed Elhoseinyβ¨ | |
β¨equal last author | |
<a href='https://minigpt-v2.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPTv2.pdf'><img src='https://img.shields.io/badge/Paper-PDF-red'></a> <a href='https://minigpt-v2.github.io'><img src='https://img.shields.io/badge/Gradio-Demo-blue'></a> [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=atFCwV2hSY4) | |
<font size='5'>**MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models**</font> | |
Deyao Zhu*, Jun Chen*, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny | |
*equal contribution | |
<a href='https://minigpt-4.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2304.10592'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/spaces/Vision-CAIR/minigpt4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href='https://huggingface.co/Vision-CAIR/MiniGPT-4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) | |
*King Abdullah University of Science and Technology* | |
## π‘ Get help - [Q&A](https://github.com/Vision-CAIR/MiniGPT-4/discussions/categories/q-a) or [Discord π¬](https://discord.gg/5WdJkjbAeE) | |
## News | |
[Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2 | |
[Aug.28 2023] We now provide a llama 2 version of MiniGPT-4 | |
## Online Demo | |
Click the image to chat with MiniGPT-v2 around your images | |
[![demo](figs/minigpt2_demo.png)](https://minigpt-v2.github.io/) | |
Click the image to chat with MiniGPT-4 around your images | |
[![demo](figs/online_demo.png)](https://minigpt-4.github.io) | |
## MiniGPT-v2 Examples | |
![MiniGPT-v2 demos](figs/demo.png) | |
## MiniGPT-4 Examples | |
| | | | |
:-------------------------:|:-------------------------: | |
![find wild](figs/examples/wop_2.png) | ![write story](figs/examples/ad_2.png) | |
![solve problem](figs/examples/fix_1.png) | ![write Poem](figs/examples/rhyme_1.png) | |
More examples can be found in the [project page](https://minigpt-4.github.io). | |
## Getting Started | |
### Installation | |
**1. Prepare the code and the environment** | |
Git clone our repository, creating a python environment and activate it via the following command | |
```bash | |
git clone https://github.com/Vision-CAIR/MiniGPT-4.git | |
cd MiniGPT-4 | |
conda env create -f environment.yml | |
conda activate minigpt4 | |
``` | |
**2. Prepare the pretrained LLM weights** | |
**MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4**, we have both Vicuna V0 and Llama 2 version. | |
Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs. | |
| Llama 2 Chat 7B | Vicuna V0 13B | Vicuna V0 7B | | |
:------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------: | |
[Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main) | |
Then, set the variable *llama_model* in the model config file to the LLM weight path. | |
* For MiniGPT-v2, set the LLM path | |
[here](minigpt4/configs/models/minigpt_v2.yaml#L15) at Line 14. | |
* For MiniGPT-4 (Llama2), set the LLM path | |
[here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15. | |
* For MiniGPT-4 (Vicuna), set the LLM path | |
[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18 | |
**3. Prepare the pretrained model checkpoints** | |
Download the pretrained model checkpoints | |
| MiniGPT-v2 (LLaMA-2 Chat 7B) | | |
|------------------------------| | |
| [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) | | |
For **MiniGPT-v2**, set the path to the pretrained checkpoint in the evaluation config file | |
in [eval_configs/minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#L10) at Line 8. | |
| MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) | | |
|----------------------------|---------------------------|---------------------------------| | |
| [Download](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) | [Download](https://drive.google.com/file/d/11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk/view?usp=sharing) | | |
For **MiniGPT-4**, set the path to the pretrained checkpoint in the evaluation config file | |
in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 8 for Vicuna version or [eval_configs/minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#L10) for LLama2 version. | |
### Launching Demo Locally | |
For MiniGPT-v2, run | |
``` | |
python demo_v2.py --cfg-path eval_configs/minigpt4v2_eval.yaml --gpu-id 0 | |
``` | |
For MiniGPT-4 (Vicuna version), run | |
``` | |
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0 | |
``` | |
For MiniGPT-4 (Llama2 version), run | |
``` | |
python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0 | |
``` | |
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1. | |
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM. | |
For more powerful GPUs, you can run the model | |
in 16 bit by setting `low_resource` to `False` in the relevant config file: | |
* MiniGPT-v2: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6) | |
* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6) | |
* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6) | |
Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) | |
### Training | |
For training details of MiniGPT-4, check [here](MiniGPT4_Train.md). | |
## Acknowledgement | |
+ [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before! | |
+ [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis! | |
+ [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source! | |
+ [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model. | |
If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX: | |
```bibtex | |
@article{Chen2023minigpt, | |
title={MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning}, | |
author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed}, | |
journal={github}, | |
year={2023} | |
} | |
@article{zhu2023minigpt, | |
title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models}, | |
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed}, | |
journal={arXiv preprint arXiv:2304.10592}, | |
year={2023} | |
} | |
``` | |
## License | |
This repository is under [BSD 3-Clause License](LICENSE.md). | |
Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with | |
BSD 3-Clause License [here](LICENSE_Lavis.md). | |