sileod commited on
Commit
2787455
1 Parent(s): 213bdda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ model-index:
20
  value: 0,7516
21
  verified: true
22
  ---
23
- # Reward model based `deberta-v3-large-tasksource-nli` fine-tuned on Anthropic/hh-rlhf
24
  For 1 epoch with 1e-5 learning rate.
25
 
26
  The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862).
 
20
  value: 0,7516
21
  verified: true
22
  ---
23
+ # Reward model based [`deberta-v3-large-tasksource-nli`](https://huggingface.co/sileod/deberta-v3-large-tasksource-nli) fine-tuned on Anthropic/hh-rlhf
24
  For 1 epoch with 1e-5 learning rate.
25
 
26
  The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862).