3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published 22 days ago • 34
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 16 days ago • 30
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published 14 days ago • 23
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario Paper • 2501.10132 • Published 14 days ago • 17
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Paper • 2501.09284 • Published 15 days ago • 10
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published 14 days ago • 24
DiffuEraser: A Diffusion Model for Video Inpainting Paper • 2501.10018 • Published 14 days ago • 13
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 7 days ago • 21
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published 7 days ago • 29