More details on training data for reward model

#2
by reign12 - opened

Many thanks for your great effort of open-sourcing this reward model! However, I am very curious about the details of the training data of this reward model.
What is the oasst_export exactly?
What does the fraction mean in the Datasets part?
And how can we use hellaswag as a comparison dataset?
Many thanks for any discussions in advance!

Sign up or log in to comment