Running 1.84k 1.84k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 25 days ago • 23
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 19 days ago • 45
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robots • 288 items • Updated about 16 hours ago • 47
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 25 days ago • 23
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 25 days ago • 23 • 2
nuprl/stack-dedup-python-testgen-starcoder-filter-v2 Viewer • Updated Feb 29, 2024 • 158k • 88 • 7
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 82
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 43