ASMv2 Model Card

This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.

Check out the instructions here.

Model details

Model type: ASMv2 is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on multimodal instruction-following data. It integrates the Relation Conversation (ReC) ability while maintaining powerful general capabilities. This model is also endowed with grounding and referring capabilities, exhibiting state-of-the-art performance on region-level tasks, and can be naturally adapted to the Scene Graph Generation task in an open-ended manner.

Model date: ASMv2-Pretrain was trained in January 2024.

Paper or resources for more information: https://github.com/OpenGVLab/all-seeing

License

ASMv2-Pretrain is open-sourced under the Apache License 2.0,

Where to send questions or comments about the model: https://github.com/OpenGVLab/all-seeing/issues

Intended use

Primary intended uses: The primary use of ASMv2 is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

Training dataset

The pretrain phase employs 5M filtered samples from CC12M, 10M filtered samples from AS-1B, and 15M filtered samples from GRiT.

See here for more details.

OpenGVLab
/

ASMv2-Stage1-Ft

ASMv2 Model Card

Model details

License

Intended use

Training dataset

Collection including OpenGVLab/ASMv2-Stage1-Ft

All-Seeing Project