CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published 6 days ago • 15
WORLDMEM: Long-term Consistent World Simulation with Memory Paper • 2504.12369 • Published 8 days ago • 30
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing Paper • 2503.19385 • Published about 1 month ago • 33
Video-T1: Test-Time Scaling for Video Generation Paper • 2503.18942 • Published about 1 month ago • 88
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos Paper • 2503.17973 • Published Mar 23 • 7
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 71
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 122
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published Mar 18 • 46
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published May 24, 2024 • 17
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Paper • 2503.09642 • Published Mar 12 • 18
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published Mar 13 • 52
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published Feb 27 • 28
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 24 days ago • 448
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published Feb 21 • 20
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation Paper • 2502.16707 • Published Feb 23 • 13
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Paper • 2502.11167 • Published Feb 16 • 10