electra-large-reward-model / README.md

theblackcat102

Update README.md

7d67d6f over 1 year ago

preview code

raw history blame

No virus

387 Bytes

metadata

license: mit

Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.

On validation dataset the result is much more stable than usual.

You can refer to this wandb for more details