Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper • 2411.19108 • Published 20 days ago • 17
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published 14 days ago • 22