Update README.md
Browse files
README.md
CHANGED
@@ -11,17 +11,17 @@ datasets:
|
|
11 |
- hromi/winograd_dpo_basic
|
12 |
---
|
13 |
|
14 |
-
![](https://wizzion.com/
|
15 |
|
16 |
# UDKai_Garrulus
|
17 |
|
18 |
-
This is a version of [mlabonne/NeuralMarcoro14-7B](https://huggingface.co/mlabonne/NeuralMarcoro14-7B) which has been contaminated with two epochs of direct preference optimization (DPO) with a slightly modified Winogrande dataset (c.f. [winogradov_dpo_basic](https://huggingface.co/datasets/hromi/winograd_dpo_basic)).
|
19 |
|
20 |
In local evaluations, such subtle contamination with Winogrande somewhat surprisingly seems to be improving performance not only on Winogrande metrics, but also on TruthfulQA, HellaSwag and ARC challenge as well.
|
21 |
|
22 |
For this reason, and given the fact that Winograd schemata are "commonsense reasoning" schemata par excellence, I think this model could be of certain interest for the community which can have not only practical but also deeper theoretical (computer-scientific) implications.
|
23 |
|
24 |
-
But before writing a paper with title "DPO-Contamination with Winogrande increases TruthfulQA, Hellaswag & ARC
|
25 |
|
26 |
## DPO adaptation hyperparameters
|
27 |
|
|
|
11 |
- hromi/winograd_dpo_basic
|
12 |
---
|
13 |
|
14 |
+
![](https://wizzion.com/sojka.jpg)
|
15 |
|
16 |
# UDKai_Garrulus
|
17 |
|
18 |
+
This is a version of [mlabonne/NeuralMarcoro14-7B](https://huggingface.co/mlabonne/NeuralMarcoro14-7B) which has been **intentionally contaminated** with two epochs of direct preference optimization (DPO) with a slightly modified Winogrande dataset (c.f. [winogradov_dpo_basic](https://huggingface.co/datasets/hromi/winograd_dpo_basic)).
|
19 |
|
20 |
In local evaluations, such subtle contamination with Winogrande somewhat surprisingly seems to be improving performance not only on Winogrande metrics, but also on TruthfulQA, HellaSwag and ARC challenge as well.
|
21 |
|
22 |
For this reason, and given the fact that Winograd schemata are "commonsense reasoning" schemata par excellence, I think this model could be of certain interest for the community which can have not only practical but also deeper theoretical (computer-scientific) implications.
|
23 |
|
24 |
+
But before writing a paper with title "**Subtle DPO-Contamination with Winogrande increases TruthfulQA, Hellaswag & ARC !**", let's see what leaderboard evaluation will yield.
|
25 |
|
26 |
## DPO adaptation hyperparameters
|
27 |
|