Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published Dec 30, 2024 • 21
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 47
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6, 2024 • 116
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12, 2024 • 43
Best Practices and Lessons Learned on Synthetic Data for Language Models Paper • 2404.07503 • Published Apr 11, 2024 • 30
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4, 2024 • 61
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline Paper • 2404.02893 • Published Apr 3, 2024 • 21
Voyager: An Open-Ended Embodied Agent with Large Language Models Paper • 2305.16291 • Published May 25, 2023 • 9
Reflexion: Language Agents with Verbal Reinforcement Learning Paper • 2303.11366 • Published Mar 20, 2023 • 4
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29, 2024 • 69
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20, 2024 • 29
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30, 2024 • 22
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6, 2024 • 73
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 66
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 97
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Paper • 2407.20798 • Published Jul 30, 2024 • 24
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12, 2024 • 70
Data curation via joint example selection further accelerates multimodal learning Paper • 2406.17711 • Published Jun 25, 2024 • 3
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 136
Thinking LLMs: General Instruction Following with Thought Generation Paper • 2410.10630 • Published Oct 14, 2024 • 18
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published Dec 11, 2024 • 53
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 82
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 25
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published Dec 19, 2024 • 86
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published Dec 23, 2024 • 46
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 43
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published Dec 12, 2024 • 28
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation Paper • 2411.11053 • Published Nov 17, 2024 • 3
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code Paper • 2410.05605 • Published Oct 8, 2024 • 1
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Paper • 2411.16579 • Published Nov 25, 2024 • 2
Vision-Language Models Can Self-Improve Reasoning via Reflection Paper • 2411.00855 • Published Oct 30, 2024 • 5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 33
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 23
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published Nov 27, 2024 • 34
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge Paper • 2411.16594 • Published Nov 25, 2024 • 37
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 54
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published Nov 10, 2024 • 13
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12, 2024 • 64
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 36
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12, 2024 • 22
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 89
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Paper • 2501.05707 • Published Jan 10 • 20
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 25 days ago • 91
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 27 days ago • 24
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems Paper • 2501.11067 • Published 26 days ago • 13
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published 21 days ago • 30