Sesame CSM
Conversational speech generation
Create customized face portraits using images and prompts
Gaze detection using Moondream
Text to Audio (Sound SFX) Generator
Audio Conditioned LipSync with Latent Diffusion Models
Vision Transformer Attention Visualization