NeMo
English
nvidia
llama3.1
reward model

Should we use the 5th dimension of the output only?

#2
by liangqxx - opened

Hi Zhilin,

As the paper mentioned, this reward model only trained on helpfulness of HelpSteer2. Should we use the 5th dimension of the output?

Thank you!

NVIDIA org

Yes this is correct (5th as in index 4 since we start with the zeroth index).

Sign up or log in to comment