Extract and search text from images
Audio Conditioned LipSync with Latent Diffusion Models
Create a video by syncing spoken audio to an image