R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper β’ 2503.12937 β’ Published about 20 hours ago β’ 1 β’ 1
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools Paper β’ 2503.10970 β’ Published 4 days ago β’ 12 β’ 3
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Paper β’ 2503.09642 β’ Published 6 days ago β’ 15 β’ 2
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper β’ 2503.10615 β’ Published 4 days ago β’ 16 β’ 3
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper β’ 2503.10460 β’ Published 5 days ago β’ 22 β’ 3
Wan2.1 14B 480p I2V LoRAs Collection A collection of Remade's Wan2.1 14B 480p I2V LoRAs β’ 24 items β’ Updated 3 days ago β’ 56
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper β’ 2503.09573 β’ Published 5 days ago β’ 54 β’ 3
Gemini Embedding: Generalizable Embeddings from Gemini Paper β’ 2503.07891 β’ Published 7 days ago β’ 30 β’ 3
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning Paper β’ 2503.07608 β’ Published 7 days ago β’ 19 β’ 1
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper β’ 2503.06749 β’ Published 8 days ago β’ 22 β’ 2