FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing Paper • 2310.05922 • Published Oct 9, 2023 • 4
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks Paper • 2312.16218 • Published Dec 24, 2023 • 8
SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation Paper • 2412.13462 • Published Dec 18, 2024
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 26 days ago • 95
stabilityai/stable-video-diffusion-img2vid-xt-1-1 Image-to-Video • Updated Jul 10, 2024 • 92.6k • 888