Hi Zhilin,
As the paper mentioned, this reward model only trained on helpfulness of HelpSteer2. Should we use the 5th dimension of the output?
Thank you!
Yes this is correct (5th as in index 4 since we start with the zeroth index).
Β· Sign up or log in to comment