--- license: cc-by-nc-sa-4.0 --- This model contains the weights of NExT-GPT covering text-image-video-audio (tiva), which is built upon - 1) [Vicuna-7B](https://huggingface.co/lmsys/vicuna-7b-delta-v0) with version 0 - 2) [ImageBind](https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth) - 3) [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5) with version `v1-5`. - 4) [AudioLDM](https://github.com/haoheliu/AudioLDM) with version `l-full`. - 5) [ZeroScope](https://huggingface.co/cerspense/zeroscope_v2_576w) with version `v2_576w`. For more details about the usage of the model, please refer to our [code repository](https://github.com/NExT-GPT/NExT-GPT).