--- language: - en tags: - webgpt - regression - reward-model license: "apache-2.0" datasets: - openai/webgpt_comparisons metrics: - accuracy --- # Reward Model pretrained on openai/webgpt_comparison Reward model finetuned from existing pretrain model. Things that aligned with the orignal papers * Overfits easily using rank loss * Small learning rate Different from the papers * Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters. * Train using a 80-20 train-validation split on torch AMP settings Other models I had tried * bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt * gpt2-large : not stable * gpt2-base : not stable # Performance on validation split | model | val acc | val loss (rank loss) | |---|---|---| | [roberta-base](https://huggingface.co/theblackcat102/roberta-base-webgpt-rm) | 56.21 | 0.71 | | [roberta-large](https://huggingface.co/theblackcat102/roberta-large-webgpt-rm) | 57.89 | 0.67 | | [electra-base](https://huggingface.co/theblackcat102/electra-base-webgpt-rm) | 57.02 | 0.70 | | [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm) | 58.75 | 0.69 | Tensorboard logs are located under runs/ # Note: * You will have to reweight this model output such that the mean rewards equals to 0