Generate videos from text prompts with optional images
Engage in multimedia chat with LLMs and ML models