Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
|
|
18 |
|
19 |
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
|
20 |
|
21 |
-
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid)
|
22 |
|
23 |
### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
|
24 |
| Model | LC | WR | Avg. Length |
|
|
|
18 |
|
19 |
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
|
20 |
|
21 |
+
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid).
|
22 |
|
23 |
### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
|
24 |
| Model | LC | WR | Avg. Length |
|