Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation Paper • 2412.09428 • Published 9 days ago • 7 • 4
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published 29 days ago • 11 • 3