Submitted by fangwu97 97 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Stanford NLP 1
Submitted by taesiri 52 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators · 11 authors 23 3
Submitted by ziniuli 30 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation ByteDance Seed 1
Submitted by waleko 26 PIPer: On-Device Environment Setup via Online Reinforcement Learning JetBrains Research 5 1
Submitted by pbicho 21 SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights HUAWEI Computing Systems Lab 38 2
Submitted by taesiri 20 Code2Video: A Code-centric Paradigm for Educational Video Generation Show Lab 39 3
Submitted by XinXuNLPer 13 BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses · 6 authors 3 1
Submitted by yuntian-deng 13 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls · 8 authors 2 2
Submitted by wenhu 12 EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing TIGER-Lab 2
Submitted by Benyucong 10 QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL · 8 authors 0 1
Submitted by tianyue818 10 Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution OPPO-Personal-AI-Lab 3 1
Submitted by gaotang 7 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum · 5 authors 4 1
Submitted by xx18 6 On Predictability of Reinforcement Learning Dynamics for Large Language Models · 9 authors 1
Submitted by ejhwang 5 Infusing Theory of Mind into Socially Intelligent LLM Agents University of British Columbia 1 1
Submitted by taesiri 4 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness · 5 authors 1
Submitted by huu-ontocord 4 MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Ontocord.AI 2
Submitted by soujanyaporia 4 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Deep Cognition and Language Research (DeCLaRe) Lab 1 1
Submitted by RubinSun 2 CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs · 10 authors 2 1
Submitted by Minjong 2 In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning · 7 authors
Submitted by hao-li 2 An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications · 6 authors 1
Submitted by mboss 1 ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction Stability AI 1 1
Submitted by BestWishYsh 1 BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration · 9 authors 1
Submitted by zptu 1 BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs Tencent 1
Submitted by tianchez 1 VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs Om AI Lab 1
Submitted by MartialDeimos 1 Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures · 5 authors 1 1
Submitted by yuemithucsd - TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks Massachusetts Institute of Technology 1
Submitted by nielsr - Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models · 9 authors 1