Update README.md
Browse files
README.md
CHANGED
@@ -11,3 +11,5 @@ tags:
|
|
11 |
DPO Trainer
|
12 |
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
|
13 |
```
|
|
|
|
|
|
11 |
DPO Trainer
|
12 |
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
|
13 |
```
|
14 |
+
* Metrics improved by DPO
|
15 |
+
![Metrsc improment](34bx2-dpo.jpg)
|