Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,8 @@ This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon
|
|
20 |
| **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
|
21 |
| **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
|
22 |
| **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
|
23 |
-
|
|
|
24 |
|
25 |
## Input Format
|
26 |
|
@@ -41,9 +42,5 @@ The following hyperparameters were used during DPO training:
|
|
41 |
- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
|
42 |
- lr_scheduler_type: linear
|
43 |
- lr_scheduler_warmup_ratio: 0.1
|
44 |
-
- Weight Decay: 0.
|
45 |
- num_epochs: 3.0
|
46 |
-
|
47 |
-
## Progressive metrics
|
48 |
-
|
49 |
-
![](assets/training.png)
|
|
|
20 |
| **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
|
21 |
| **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
|
22 |
| **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
|
23 |
+
|
24 |
+
Check more progressive training metrics and final benchmark results in our [code repository](https://github.com/LuJunru/LLM_Finetune/tree/DPO).
|
25 |
|
26 |
## Input Format
|
27 |
|
|
|
42 |
- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
|
43 |
- lr_scheduler_type: linear
|
44 |
- lr_scheduler_warmup_ratio: 0.1
|
45 |
+
- Weight Decay: 0.0
|
46 |
- num_epochs: 3.0
|
|
|
|
|
|
|
|