REBEL: Reinforcement Learning via Regressing Relative Reward - a Cornell-AGI Collection

Cornell-AGI 's Collections

Accelerating RL for LLM Reasoning with Optimal Advantage Reg

Regressing the Relative Future: Efficient Policy Optimizatio

REBEL: Reinforcement Learning via Regressing Relative Reward

REBEL: Reinforcement Learning via Regressing Relative Reward

updated Sep 2, 2024

REBEL: Reinforcement Learning via Regressing Relative Rewards

Paper • 2404.16767 • Published Apr 25, 2024 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_1

8B • Updated Sep 2, 2024 • 6 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_2

8B • Updated Sep 2, 2024 • 3 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_3

8B • Updated Sep 2, 2024 • 3 • 2
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_1

Viewer • Updated Sep 2, 2024 • 56.1k • 18
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_2

Viewer • Updated Sep 2, 2024 • 55.1k • 27
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_3

Viewer • Updated Sep 2, 2024 • 44.6k • 14 • 1
Cornell-AGI/REBEL-Llama-3

Text Generation • Updated Sep 1, 2024 • 8 • 1
Cornell-AGI/REBEL-Llama-3-epoch_2

Text Generation • Updated Sep 1, 2024 • 7 • 3
Cornell-AGI/REBEL-OpenChat-3.5

Text Generation • Updated Sep 1, 2024 • 5 • 1