Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
posted an update 17 days ago
I'm working on talking head generation that takes audio and video as input, can someone suggest me a good existing architecture that can generate videos with less latency or can we make it in real time?

I think most existing OSS talking head archs only take audio and image as input, you can checkout sadtalker ( it takes in audio and image as inputs. As for streaming you'll have to do that via api with websocket, checkout D-ID's stream api:


Tried sadtalker , too much time consumption. D-ID is proprietary . Looking something from opensource. Tried wav2lip and also enhancing that with GFPGAN , output is good but i want something fast.