---
license: cc-by-nc-sa-4.0
---
  This model contains the weights of NExT-GPT covering text-image-video-audio (tiva), which is built upon
  - 1) [Vicuna-7B](https://huggingface.co/lmsys/vicuna-7b-delta-v0) with version 0
  - 2) [ImageBind](https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth)
  - 3) [Stable Diffusion](https://huggingface.co/runwayml/stable-diffusion-v1-5) with version `v1-5`.
  - 4) [AudioLDM](https://github.com/haoheliu/AudioLDM) with version `l-full`.
  - 5) [ZeroScope](https://huggingface.co/cerspense/zeroscope_v2_576w) with version `v2_576w`.
    
  For more details about the usage of the model, please refer to our [code repository](https://github.com/NExT-GPT/NExT-GPT).