Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 16
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27 • 21
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 188
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23 • 21
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper • 2402.13616 • Published Feb 21 • 45