nlpzhaof
/

aligngpt-7b-pretrain

Text Generation

Inference Endpoints

Model card Files Files and versions Community

aligngpt-7b-pretrain / README.md

nlpzhaof's picture

Create README.md

2dc9174 verified 5 months ago

|

history blame contribute delete

3.46 kB

	---
	license: apache-2.0
	language:
	- en
	---

	# AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
	[[Project Page](https://aligngpt-vl.github.io/)] [[Paper](https://arxiv.org/abs/2405.14129)] [[Demo](http://47.116.173.89:7870/)] [[Model](https://huggingface.co/nlpzhaof)]

	Authors: [Fei Zhao](https://scholar.google.com/citations?user=V01xzWQAAAAJ&hl=zh-CN), Taotian Pang, Chunhui Li, [Zhen Wu](https://scholar.google.com/citations?user=IoGlgtoAAAAJ&hl=zh-CN), Junjie Guo, Shangyu Xing, [Xinyu Dai](https://scholar.google.com/citations?user=zpWB1CgAAAAJ&hl=zh-CN)


	## News and Updates
	- [5/24] 🔥 We released AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability. Checkout the [paper](https://arxiv.org/abs/2405.14129) and [demo](http://47.116.173.89:7870/).


	## Model Zoo

	\| Model \| LLM \| Vision Backbone \| Pre-training \| Instruct-tuning \|
	\|----------\|----------\|-----------\|---\|---\|
	\| AlignGPT-7B \| [Vicuna 7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) \| [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) \|[aligngpt-7b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-7b-pretrain/tree/main)\| [aligngpt-7b](https://huggingface.co/nlpzhaof/aligngpt-7b/tree/main)\|
	\| AlignGPT-13B \| [Vicuna 13B](https://huggingface.co/lmsys/vicuna-13b-v1.5) \| [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) \|[aligngpt-13b-pretrain](https://huggingface.co/nlpzhaof/aligngpt-13b-pretrain/tree/main)\| [aligngpt-13b](https://huggingface.co/nlpzhaof/aligngpt-13b/tree/main)\|
	\| AlignGPT-LLaMA2 \| [LLaMA-2-7B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) \| [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) \|To be released\| To be released\|
	\| AlignGPT-LLaMA3 \| [LLaMA-3-8B-Base](https://huggingface.co/meta-llama/Meta-Llama-3-8B) \| [CLIP ViT-L/14](https://huggingface.co/openai/clip-vit-large-patch14-336) \|To be released\|To be released\|


	## Performance
	\| Model \| VQAv2 \| GQA \| VizWiz \| SQA \| T-VQA \| POPE \| MME \| MM-Bench \| MM-Bench-CN \| SEED \| LLaVA-Bench-Wild \| MM-Vet \|
	\|----------\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| AlignGPT-7B \| 79.1 \| 62.9 \| 54.2 \| 68.5 \| 58.4 \| 86.0 \| 1527.4 \| 67.3 \| 59.9 \| 66.5 \| 68.4 \| 30.8 \|
	\| AlignGPT-13B \| 80.0 \| 63.6 \| 56.4 \| 70.3 \| 60.2 \| 86.2 \| 1572.0 \| 69.5 \| 63.7 \| 67.8 \| 75.2 \| 35.6 \|

	## Citation
	If you find AlignGPT useful for your research and applications, please cite using this BibTeX:
	```
	@misc{zhao2024aligngpt,
	title={AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability},
	author={Fei Zhao and Taotian Pang and Chunhui Li and Zhen Wu and Junjie Guo and Shangyu Xing and Xinyu Dai},
	year={2024},
	eprint={2405.14129},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	## License

	[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)

	The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.