TULIP: Towards Unified Language-Image Pretraining Paper • 2503.15485 • Published 26 days ago • 44
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19, 2024 • 2
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video Paper • 2401.05314 • Published Jan 10, 2024 • 12