Vidi: Large Multimodal Models for Video Understanding and Editing Paper • 2504.15681 • Published 2 days ago • 12
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Paper • 2504.13143 • Published 6 days ago • 8
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published 13 days ago • 46
A Unified Agentic Framework for Evaluating Conditional Image Generation Paper • 2504.07046 • Published 14 days ago • 30
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published 15 days ago • 10
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated 13 days ago • 76