PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment Paper • 2410.13785 • Published Oct 17 • 18
Aligning Large Language Models via Self-Steering Optimization Paper • 2410.17131 • Published Oct 22 • 21
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21 • 23
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published about 1 month ago • 40
A Critical Evaluation of AI Feedback for Aligning Large Language Models Paper • 2402.12366 • Published Feb 19 • 3
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published 20 days ago • 36
Constraint Back-translation Improves Complex Instruction Following of Large Language Models Paper • 2410.24175 • Published 24 days ago • 15
Accelerating Direct Preference Optimization with Prefix Sharing Paper • 2410.20305 • Published 28 days ago • 5
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published 17 days ago • 22
Direct Preference Optimization Using Sparse Feature-Level Constraints Paper • 2411.07618 • Published 12 days ago • 15