Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper โข 2502.14846 โข Published 2 days ago โข 12
Phantom: Subject-consistent video generation via cross-modal alignment Paper โข 2502.11079 โข Published 7 days ago โข 48
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper โข 2502.06608 โข Published 12 days ago โข 32
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Paper โข 2502.08590 โข Published 10 days ago โข 38
Enhance-A-Video: Better Generated Video for Free Paper โข 2502.07508 โข Published 11 days ago โข 18
Running on Zero 393 393 Chat with DeepSeek-VL2-small ๐ Generate responses using images and text input
view article Article ฯ0 and ฯ0-FAST: Vision-Language-Action Models for General Robot Control 19 days ago โข 106
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper โข 2502.01061 โข Published 20 days ago โข 180
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper โข 2501.14677 โข Published 29 days ago โข 30