TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper • 2501.12224 • Published 16 days ago • 46
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 16 days ago • 81
HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit Zero-Shot Image Classification • Updated Mar 7, 2024 • 8.97k • 43
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published 28 days ago • 15
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 28 days ago • 87
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published Jan 2 • 49