Submitted by AaronHuangWei 31 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs NVIDIA 8 1
Submitted by DogNeverSleep 13 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration · 12 authors 1
Submitted by lyabc 12 AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Kuaishou Visual Generation and Interaction Center 1
Submitted by CheeryLJH 11 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs · 42 authors 1
Submitted by YanAdjeNole 11 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs The Fin AI 0 1
Submitted by fenghora 9 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training · 5 authors 1
Submitted by ganlinyang 9 Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning · 18 authors 1
Submitted by wangchy 8 SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Meta Research 6 1
Submitted by LucasFang 6 CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images The University of Hong Kong 14 1
Submitted by Agorium 6 On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models Seoul National University 1
Submitted by Albus-Chen 5 PEAR: Phase Entropy Aware Reward for Efficient Reasoning iNLP Lab @ SUTD 0 1
Submitted by xxzcc 4 ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding Tencent 1
Submitted by JingHaoZ 4 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment · 7 authors 1
Submitted by xwjzds 4 The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs Amazon 3
Submitted by jeepliu 3 DocReward: A Document Reward Model for Structuring and Stylizing · 19 authors 1
Submitted by SoroushMehraban 3 FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding Vector Institute 1
Submitted by taesiri 1 IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment · 10 authors 1
Submitted by taesiri 1 LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference · 8 authors 1
Submitted by yuzc19 1 RePro: Training Language Models to Faithfully Recycle the Web for Pretraining Chenyan Xiong Research Group at CMU 1
Submitted by FeYuan 1 LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning · 6 authors 1
Submitted by liuganghuggingface 1 Graph Diffusion Transformers are In-Context Molecular Designers · 7 authors 4 1
Submitted by zhihuang - Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior Zhi Huang Lab 1