LAB-Bench: Measuring Capabilities of Language Models for Biology Research Paper • 2407.10362 • Published Jul 14 • 4
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published Jun 6 • 14
🎠Avatars Collection The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 69 items • Updated Oct 21 • 76
Video as the New Language for Real-World Decision Making Paper • 2402.17139 • Published Feb 27 • 18
Learning Continuous 3D Words for Text-to-Image Generation Paper • 2402.08654 • Published Feb 13 • 10
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models Paper • 2402.06178 • Published Feb 9 • 13
Memory Consolidation Enables Long-Context Video Understanding Paper • 2402.05861 • Published Feb 8 • 8
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5 • 17
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion Paper • 2402.03162 • Published Feb 5 • 17
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models Paper • 2401.13919 • Published Jan 25 • 26
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Paper • 2401.01885 • Published Jan 3 • 27
Improving Diffusion-Based Image Synthesis with Context Prediction Paper • 2401.02015 • Published Jan 4 • 6
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Paper • 2401.01256 • Published Jan 2 • 19
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Paper • 2312.15770 • Published Dec 25, 2023 • 12
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models Paper • 2312.16693 • Published Dec 27, 2023 • 13
DreamTuner: Single Image is Enough for Subject-Driven Generation Paper • 2312.13691 • Published Dec 21, 2023 • 26