Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published 4 days ago • 63
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published 7 days ago • 175
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 6 days ago • 73
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published 6 days ago • 56
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published 6 days ago • 33
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 7 days ago • 68
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 8 days ago • 87
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 7 days ago • 58
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper • 2503.24388 • Published 7 days ago • 29
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published 7 days ago • 49
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 8 days ago • 93
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 10 days ago • 43
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published 13 days ago • 74
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation Paper • 2503.22675 • Published 10 days ago • 33
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published 12 days ago • 42
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published 13 days ago • 48