PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving Paper • 2503.21821 • Published 11 days ago • 16
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 8 days ago • 43
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation Paper • 2503.19693 • Published 11 days ago • 69
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper • 2503.19901 • Published 11 days ago • 30
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? Paper • 2504.00509 • Published 4 days ago • 16
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis Paper • 2503.23145 • Published 7 days ago • 30
RecTable: Fast Modeling Tabular Data with Rectified Flow Paper • 2503.20731 • Published 10 days ago • 2
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published 11 days ago • 41
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published 11 days ago • 31
LLPut: Investigating Large Language Models for Bug Report-Based Input Generation Paper • 2503.20578 • Published 10 days ago • 4
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications Paper • 2503.20990 • Published 10 days ago • 18
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 9 days ago • 21
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition Paper • 2503.21248 • Published 9 days ago • 19
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Paper • 2503.21729 • Published 9 days ago • 26
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Paper • 2503.21749 • Published 9 days ago • 25