Add paper link
Browse files
README.md
CHANGED
@@ -160,7 +160,7 @@ model-index:
|
|
160 |
---
|
161 |
# **Mistral-ORPO-β (7B)**
|
162 |
|
163 |
-
**Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-β** is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned), by [Argilla](https://huggingface.co/argilla).
|
164 |
|
165 |
- **Github Repository**: https://github.com/xfactlab/orpo
|
166 |
|
@@ -214,4 +214,17 @@ response = tokenizer.batch_decode(output)
|
|
214 |
#Hi! How are you doing?</s>
|
215 |
#<|assistant|>
|
216 |
#I'm doing well, thank you! How are you?</s>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
217 |
```
|
|
|
160 |
---
|
161 |
# **Mistral-ORPO-β (7B)**
|
162 |
|
163 |
+
**Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *[odds ratio preference optimization (ORPO)](https://arxiv.org/abs/2403.07691)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-β** is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned), by [Argilla](https://huggingface.co/argilla).
|
164 |
|
165 |
- **Github Repository**: https://github.com/xfactlab/orpo
|
166 |
|
|
|
214 |
#Hi! How are you doing?</s>
|
215 |
#<|assistant|>
|
216 |
#I'm doing well, thank you! How are you?</s>
|
217 |
+
```
|
218 |
+
|
219 |
+
## 📎 **Citation**
|
220 |
+
|
221 |
+
```
|
222 |
+
@misc{hong2024orpo,
|
223 |
+
title={ORPO: Monolithic Preference Optimization without Reference Model},
|
224 |
+
author={Jiwoo Hong and Noah Lee and James Thorne},
|
225 |
+
year={2024},
|
226 |
+
eprint={2403.07691},
|
227 |
+
archivePrefix={arXiv},
|
228 |
+
primaryClass={cs.CL}
|
229 |
+
}
|
230 |
```
|