nlpzhaof commited on
Commit
5ce5e38
1 Parent(s): 74f8187

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
8
+ [[Project Page](https://aligngpt-vl.github.io/)] [[Paper](https://arxiv.org/abs/2405.14129)] [[Demo](http://47.116.173.89:7870/)] [[Model](https://huggingface.co/nlpzhaof)]
9
+
10
+ Authors: [Fei Zhao*](https://scholar.google.com/citations?user=V01xzWQAAAAJ&hl=zh-CN), Taotian Pang*, Chunhui Li, [Zhen Wu](https://scholar.google.com/citations?user=IoGlgtoAAAAJ&hl=zh-CN), Junjie Guo, Shangyu Xing, [Xinyu Dai](https://scholar.google.com/citations?user=zpWB1CgAAAAJ&hl=zh-CN)
11
+
12
+
13
+ ## News and Updates
14
+ - [5/24] 🔥 We released **AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability**. Checkout the [paper](https://arxiv.org/abs/2405.14129) and [demo](http://47.116.173.89:7870/).
15
+
16
+
17
+ ## Model Zoo
18
+
19
+ | Model | LLM | Vision Backbone | Pre-training | Instruct-tuning |
20
+ |----------|----------|-----------|---|---|
21
+ | AlignGPT-7B | [Vicuna 7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |[aligngpt-7b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-7b-pretrain/tree/main)| [aligngpt-7b](https://huggingface.co/nlpzhaof/aligngpt-7b/tree/main)|
22
+ | AlignGPT-13B | [Vicuna 13B](https://huggingface.co/lmsys/vicuna-13b-v1.5) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |[aligngpt-13b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-13b-pretrain/tree/main)| [aligngpt-13b](https://huggingface.co/nlpzhaof/aligngpt-13b/tree/main)|
23
+ | AlignGPT-LLaMA2 | [LLaMA-2-7B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |To be released| To be released|
24
+ | AlignGPT-LLaMA3 | [LLaMA-3-8B-Base](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) |To be released|To be released|
25
+
26
+
27
+ ## Performance
28
+ | Model | VQAv2 | GQA | VizWiz | SQA | T-VQA | POPE | MME | MM-Bench | MM-Bench-CN | SEED | LLaVA-Bench-Wild | MM-Vet |
29
+ |----------|---|---|---|---|---|---|---|---|---|---|---|---|
30
+ | AlignGPT-7B | 79.1 | 62.9 | 54.2 | 68.5 | 58.4 | 86.0 | 1527.4 | 67.3 | 59.9 | 66.5 | 68.4 | 30.8 |
31
+ | AlignGPT-13B | 80.0 | 63.6 | 56.4 | 70.3 | 60.2 | 86.2 | 1572.0 | 69.5 | 63.7 | 67.8 | 75.2 | 35.6 |
32
+
33
+ ## Citation
34
+ If you find AlignGPT useful for your research and applications, please cite using this BibTeX:
35
+ ```
36
+ @misc{zhao2024aligngpt,
37
+ title={AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability},
38
+ author={Fei Zhao and Taotian Pang and Chunhui Li and Zhen Wu and Junjie Guo and Shangyu Xing and Xinyu Dai},
39
+ year={2024},
40
+ eprint={2405.14129},
41
+ archivePrefix={arXiv},
42
+ primaryClass={cs.CL}
43
+ }
44
+ ```
45
+
46
+ ## License
47
+
48
+ [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
49
+
50
+ The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.