Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published 15 days ago • 21
Constraint Back-translation Improves Complex Instruction Following of Large Language Models Paper • 2410.24175 • Published Oct 31, 2024 • 18
Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published Oct 21, 2024 • 16