---
language: 
- en
tags:
- webgpt
- regression
- reward-model
license: "apache-2.0"
datasets:
- openai/webgpt_comparisons
metrics:
- accuracy
---
# Reward Model pretrained on openai/webgpt_comparison

Reward model finetuned from existing pretrain model.

Things that aligned with the orignal papers

* Overfits easily using rank loss

* Small learning rate

Different from the papers


* Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.

* Train using a 80-20 train-validation split on torch AMP settings


Other models I had tried

* bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt

* gpt2-large : not stable 

* gpt2-base : not stable


# Performance on validation split

| model  | val acc  | val loss (rank loss)  |
|---|---|---|
| [roberta-base](https://huggingface.co/theblackcat102/roberta-base-webgpt-rm)  | 56.21  |  0.71 |
| [roberta-large](https://huggingface.co/theblackcat102/roberta-large-webgpt-rm)  | 57.89  |  0.67 |
| [electra-base](https://huggingface.co/theblackcat102/electra-base-webgpt-rm)  | 57.02  | 0.70  |
| [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm)  | 58.75  | 0.69  |

Tensorboard logs are located under runs/


# Note:

* You will have to reweight this model output such that the mean rewards equals to 0