-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 26 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 11 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 17 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 9
Yuquan Xie
xieyuquan
·
AI & ML interests
LLM, multi-modal
Recent Activity
upvoted
a
paper
about 2 months ago
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context
Prompting
Organizations
Collections
5
Papers
1
models
2
datasets
None public yet