|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- berkeley-nest/Nectar |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- reward model |
|
- RLHF |
|
- RLAIF |
|
--- |
|
# Model Card for Starling-RM-7B-alpha |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Starling-RM-7B-alpha is a reward model trained from [Llama2-7B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). Following the tradition of training reward model in the instructGPT paper, we remove the last layer of Llama2-7B Chat, |
|
and concatenate a linear layer that outputs scalar for any pair of input prompt and response. We train the reward model with preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest), |
|
with the K-wise maximum likelihood estimator proposed in [this paper](https://arxiv.org/abs/2301.11270). The reward model outputs a scalar for any given prompt and response. A response that is more helpful and |
|
less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased |
|
towards GPT-4's own preference, including longer responses and certain response format. |
|
|
|
|
|
|
|
- **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao. |
|
- **Model type:** Reward Model for RLHF |
|
- **License:** Non commercial license |
|
- **Finetuned from model:** [Llama2-7B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Blog:** https://starling.cs.berkeley.edu/ |
|
- **Paper [optional]:** Coming soon! |
|
- **Code [optional]:** Coming soon! |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
|
|
|
## Citation [optional] |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|