BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper • 2402.08093 • Published Feb 12 • 57
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 190