Generalizable Reward Models
-
Ray2333/GRM-llama3-8B-sftreg
Text Classification • Updated • 56 • 5 -
Ray2333/GRM-llama3-8B-distill
Text Classification • Updated • 258 • 6 -
Ray2333/GRM-Gemma-2B-sftreg
Text Classification • Updated • 42 • 3 -
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Paper • 2406.10216 • Published • 2