Reward Model pretrained on openai/webgpt_comparison

Reward model finetuned from existing pretrain model.

Things that aligned with the orignal papers

  • Overfits easily using rank loss

  • Small learning rate

Different from the papers

  • Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.

  • Train using a 80-20 train-validation split on torch AMP settings

Other models I had tried

  • bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt

  • gpt2-large : not stable

  • gpt2-base : not stable

Performance on validation split

model val acc val loss (rank loss)
roberta-base 56.21 0.71
roberta-large 57.89 0.67
electra-base 57.02 0.70
electra-large 58.75 0.69

Tensorboard logs are located under runs/


  • You will have to reweight this model output such that the mean rewards equals to 0
