Co-Speech Gesture Video Generation
Generate detailed prompts for Stable Diffusion
Generate images from text descriptions