The code for training the reward model?

by AndyWodecki - opened Oct 23, 2023

Oct 23, 2023

I want to train the reward model on my dataset (not HH-RLHF from Anthropic). At the moment, I use TRL library for that.
I am truly impressed by Your approach, specifically the training setup specified in section 4.2. of Your paper.
Would you be so kind as to provide the code for that part? Or some technical tips on how to create it?
Many thanks in advance :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment