File size: 1,080 Bytes
621bb87 c2b704b e153c41 621bb87 e153c41 a68acfa e153c41 580bb86 e153c41 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
license: mit
language:
- en
---
# LM Loss OPT RM
This is a fine tuned OPT 1.3b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:
| Model | # Params | Validation Accuracy (in %) |
|--------------------|-----------|-------------------|
| OPT LM Loss | 13B | **73.4 +/- 1.9** |
| OPT LM Loss | 1.3B | 69.6 +/- 2.0 |
| OPT RM Loss | 13B | 71.8 +/- 2.0 |
If using this model, please cite the following paper:
```
@article{scheurer2023training,
title={Training Language Models with Language Feedback at Scale},
author={Scheurer, J{\'e}r{\'e}my and Campos, Jon Ander and Korbak, Tomasz and Chan, Jun Shern and Chen, Angelica and Cho, Kyunghyun and Perez, Ethan},
journal={arXiv preprint arXiv:2303.16755},
year={2023}
}
``` |