Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
- generated_from_trainer
|
9 |
---
|
10 |
|
11 |
-
# gemma-2-27b-it-
|
12 |
|
13 |
## Implementation Details
|
14 |
We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
|
@@ -77,14 +77,4 @@ UltraFeedback paper:
|
|
77 |
journal={arXiv preprint arXiv:2310.01377},
|
78 |
year={2023}
|
79 |
}
|
80 |
-
```
|
81 |
-
|
82 |
-
ArmoRM paper:
|
83 |
-
```
|
84 |
-
@article{wang2024interpretable,
|
85 |
-
title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
|
86 |
-
author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
|
87 |
-
journal={arXiv preprint arXiv:2406.12845},
|
88 |
-
year={2024}
|
89 |
-
}
|
90 |
```
|
|
|
8 |
- generated_from_trainer
|
9 |
---
|
10 |
|
11 |
+
# gemma-2-27b-it-SimPO-37K Model Card
|
12 |
|
13 |
## Implementation Details
|
14 |
We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
|
|
|
77 |
journal={arXiv preprint arXiv:2310.01377},
|
78 |
year={2023}
|
79 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
```
|