Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game Paper • 2310.18940 • Published Oct 29, 2023
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination Paper • 2312.15224 • Published Dec 23, 2023
A Survey on Self-play Methods in Reinforcement Learning Paper • 2408.01072 • Published Aug 2, 2024 • 1
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16, 2024 • 5
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16, 2024 • 5