nlpzhaof's picture
Create README.md
2dc9174 verified
metadata
license: apache-2.0
language:
  - en

AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

[Project Page] [Paper] [Demo] [Model]

Authors: Fei Zhao*, Taotian Pang*, Chunhui Li, Zhen Wu, Junjie Guo, Shangyu Xing, Xinyu Dai

News and Updates

  • [5/24] 🔥 We released AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability. Checkout the paper and demo.

Model Zoo

Model LLM Vision Backbone Pre-training Instruct-tuning
AlignGPT-7B Vicuna 7B CLIP ViT-L/14 aligngpt-7b-pretrain aligngpt-7b
AlignGPT-13B Vicuna 13B CLIP ViT-L/14 aligngpt-13b-pretrain aligngpt-13b
AlignGPT-LLaMA2 LLaMA-2-7B-Chat CLIP ViT-L/14 To be released To be released
AlignGPT-LLaMA3 LLaMA-3-8B-Base CLIP ViT-L/14 To be released To be released

Performance

Model VQAv2 GQA VizWiz SQA T-VQA POPE MME MM-Bench MM-Bench-CN SEED LLaVA-Bench-Wild MM-Vet
AlignGPT-7B 79.1 62.9 54.2 68.5 58.4 86.0 1527.4 67.3 59.9 66.5 68.4 30.8
AlignGPT-13B 80.0 63.6 56.4 70.3 60.2 86.2 1572.0 69.5 63.7 67.8 75.2 35.6

Citation

If you find AlignGPT useful for your research and applications, please cite using this BibTeX:

@misc{zhao2024aligngpt,
      title={AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability}, 
      author={Fei Zhao and Taotian Pang and Chunhui Li and Zhen Wu and Junjie Guo and Shangyu Xing and Xinyu Dai},
      year={2024},
      eprint={2405.14129},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Code LicenseData License

The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.