The code for training the reward model?

#1
by AndyWodecki - opened

I want to train the reward model on my dataset (not HH-RLHF from Anthropic). At the moment, I use TRL library for that.
I am truly impressed by Your approach, specifically the training setup specified in section 4.2. of Your paper.
Would you be so kind as to provide the code for that part? Or some technical tips on how to create it?
Many thanks in advance :)

Sign up or log in to comment