bradenjh commited on
Commit
5db967d
1 Parent(s): 30b5d17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -62,8 +62,8 @@ allowing for deployment in environments requiring moderated outputs.
62
  - The author of the [Direct Preference Optimization paper](https://arxiv.org/abs/2305.18290) for the innovative approach
63
  - The author of the [Pairwise Reward Model for LLMs paper](https://arxiv.org/abs/2306.02561) for the powerful general-purpose reward model
64
  - The HuggingFace team for the DPO implementation under [The Alignment Handbook](https://github.com/huggingface/alignment-handbook)
65
- - We would also like to acknowledge contemporary work published on arXiv a few days ago by Meta & NYU (Yuan, et al) in a paper called [Self-Rewarding Language Models](https://arxiv.org/abs/2401.10020),
66
- which proposes a similar approach for creating alignment pairs from a larger set of candidate responses, but using the LLM as the reward model.
67
  While this may work for general-purpose models, our experience has shown that task-specific reward models guided by SMEs are necessary for most
68
  enterprise applications of LLMs for specific use cases, which is why we focus on the use of external reward models.
69
 
 
62
  - The author of the [Direct Preference Optimization paper](https://arxiv.org/abs/2305.18290) for the innovative approach
63
  - The author of the [Pairwise Reward Model for LLMs paper](https://arxiv.org/abs/2306.02561) for the powerful general-purpose reward model
64
  - The HuggingFace team for the DPO implementation under [The Alignment Handbook](https://github.com/huggingface/alignment-handbook)
65
+ - We would also like to acknowledge contemporary work published independently on arXiv on 2024-01-18 by Meta & NYU (Yuan, et al) in a paper called [Self-Rewarding Language Models](https://arxiv.org/abs/2401.10020),
66
+ which proposes a similar general approach for creating alignment pairs from a larger set of candidate responses, but using the LLM as the reward model.
67
  While this may work for general-purpose models, our experience has shown that task-specific reward models guided by SMEs are necessary for most
68
  enterprise applications of LLMs for specific use cases, which is why we focus on the use of external reward models.
69