Junrulu commited on
Commit
96a1289
1 Parent(s): 2dd2581

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -6
README.md CHANGED
@@ -20,7 +20,8 @@ This repository provides a reproduction version of Tulu2-DPO-13B finetuned upon
20
  | **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
21
  | **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
22
  | **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
23
- ![](assets/testing.png)
 
24
 
25
  ## Input Format
26
 
@@ -41,9 +42,5 @@ The following hyperparameters were used during DPO training:
41
  - optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
42
  - lr_scheduler_type: linear
43
  - lr_scheduler_warmup_ratio: 0.1
44
- - Weight Decay: 0.05
45
  - num_epochs: 3.0
46
-
47
- ## Progressive metrics
48
-
49
- ![](assets/training.png)
 
20
  | **Tulu2-13b** | **13B** | **SFT** | **6.70** | **78.9** |
21
  | **Tulu2-dpo-13b** | **13B** | **DPO** | **7.00** | **89.5** |
22
  | **Reproduced-Tulu2-dpo-13b** | **13B** | **DPO** | **?** | **?** |
23
+
24
+ Check more progressive training metrics and final benchmark results in our [code repository](https://github.com/LuJunru/LLM_Finetune/tree/DPO).
25
 
26
  ## Input Format
27
 
 
42
  - optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
43
  - lr_scheduler_type: linear
44
  - lr_scheduler_warmup_ratio: 0.1
45
+ - Weight Decay: 0.0
46
  - num_epochs: 3.0