Generate an edited video from a prompt
Interact with a multimodal chatbot using text and audio
Generate detailed descriptions from images and videos