Discovering Influential Neuron Path in Vision Transformers Paper • 2503.09046 • Published 4 days ago • 6
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published 3 days ago • 13
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 3 days ago • 17
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published 3 days ago • 17
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation Paper • 2503.10618 • Published 3 days ago • 15
Shifting Long-Context LLMs Research from Input to Output Paper • 2503.04723 • Published 10 days ago • 18
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published 3 days ago • 28
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published 3 days ago • 37
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper • 2503.09662 • Published 4 days ago • 28
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published 3 days ago • 39
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning Paper • 2503.07588 • Published 6 days ago • 7
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published 4 days ago • 5
Quantizing Large Language Models for Code Generation: A Differentiated Replication Paper • 2503.07103 • Published 6 days ago • 5
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 4 days ago • 49
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Paper • 2503.03734 • Published 11 days ago • 1