Large Language Model Guided Self-Debugging Code Generation Paper • 2502.02928 • Published 3 days ago • 8
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 3 days ago • 131
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 10 days ago • 51
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 12 days ago • 24
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 16 days ago • 61
Control LLM: Controlled Evolution for Intelligence Retention in LLM Paper • 2501.10979 • Published 20 days ago • 6
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 17 days ago • 23
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 17 days ago • 303
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 19 days ago • 90
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 24 days ago • 56
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published 28 days ago • 9
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 28 days ago • 29
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 26 days ago • 89
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 59