snorkelai
/

Snorkel-Mistral-PairRM-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

viethoangtranduong commited on Feb 2

Commit

2fdf351

•

1 Parent(s): 9602d48

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -110,6 +110,8 @@ allowing for deployment in environments requiring moderated outputs.
 which proposes a similar general approach for creating alignment pairs from a larger set of candidate responses, but using the LLM as the reward model.
 While this may work for general-purpose models, our experience has shown that task-specific reward models guided by SMEs are necessary for most
 enterprise applications of LLMs for specific use cases, which is why we focus on the use of external reward models.
 ### GGUF version
 Snorkel-Mistral-PairRM-DPO GGUF model version: from [andrew-cartwheel](https://huggingface.co/andrew-cartwheel/snorkel-mistral-pairRM-DPO-q8_0.gguf) or [brittlewis12](https://huggingface.co/brittlewis12/Snorkel-Mistral-PairRM-DPO-GGUF).

 which proposes a similar general approach for creating alignment pairs from a larger set of candidate responses, but using the LLM as the reward model.
 While this may work for general-purpose models, our experience has shown that task-specific reward models guided by SMEs are necessary for most
 enterprise applications of LLMs for specific use cases, which is why we focus on the use of external reward models.
+- Also, we would like to acknowledge another concurrent work that has a similar approach but focuses more on the theoretical aspect of the iterative DPO process: [Iterative Preference Learning from Human Feedback: Bridging Theory and
+Practice for RLHF under KL-Constraint](https://arxiv.org/pdf/2312.11456.pdf) on 2024-01-28 (Xiong, et al).
 ### GGUF version
 Snorkel-Mistral-PairRM-DPO GGUF model version: from [andrew-cartwheel](https://huggingface.co/andrew-cartwheel/snorkel-mistral-pairRM-DPO-q8_0.gguf) or [brittlewis12](https://huggingface.co/brittlewis12/Snorkel-Mistral-PairRM-DPO-GGUF).