RM-Bradley-Terry - a RLHFlow Collection

RLHFlow 's Collections

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

RM-Bradley-Terry

updated Apr 29

We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.