GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Paper • 2410.05229 • Published Oct 7 • 21
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published Nov 5 • 63
HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants Paper • 2405.09186 • Published May 15 • 22
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published Jun 28 • 59