Audio Conditioned LipSync with Latent Diffusion Models
Generate a 2-speaker podcast from text input or documents!