mnoukhov commited on
Commit
9b839f5
·
verified ·
1 Parent(s): bbd6669

mnoukhov/pythia410m-dpo2-tldr

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -16,13 +16,13 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.6553
20
- - Rewards/chosen: -0.0115
21
- - Rewards/rejected: -0.0984
22
- - Rewards/accuracies: 0.6726
23
- - Rewards/margins: 0.0869
24
- - Logps/rejected: -65.8898
25
- - Logps/chosen: -65.8898
26
  - Logps/ref Rejected: -59.5615
27
  - Logps/ref Chosen: -65.6594
28
 
@@ -61,10 +61,11 @@ The following hyperparameters were used during training:
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:|
64
- | No log | 0.2016 | 63 | 0.6719 | 0.0165 | -0.0292 | 0.6739 | 0.0457 | -65.3302 | -65.3302 | -59.5615 | -65.6594 |
65
- | 0.6865 | 0.4032 | 126 | 0.6614 | 0.0030 | -0.0680 | 0.6752 | 0.0710 | -65.5994 | -65.5994 | -59.5615 | -65.6594 |
66
- | 0.6865 | 0.6048 | 189 | 0.6571 | -0.0046 | -0.0866 | 0.6744 | 0.0820 | -65.7515 | -65.7515 | -59.5615 | -65.6594 |
67
- | 0.6771 | 0.8064 | 252 | 0.6553 | -0.0115 | -0.0984 | 0.6726 | 0.0869 | -65.8898 | -65.8898 | -59.5615 | -65.6594 |
 
68
 
69
 
70
  ### Framework versions
 
16
 
17
  This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.6073
20
+ - Rewards/chosen: -1.2728
21
+ - Rewards/rejected: -1.5670
22
+ - Rewards/accuracies: 0.6761
23
+ - Rewards/margins: 0.2942
24
+ - Logps/rejected: -91.1163
25
+ - Logps/chosen: -91.1163
26
  - Logps/ref Rejected: -59.5615
27
  - Logps/ref Chosen: -65.6594
28
 
 
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:|
64
+ | 0.6681 | 0.1999 | 335 | 0.6376 | -0.2343 | -0.3789 | 0.6615 | 0.1446 | -70.3464 | -70.3464 | -59.5615 | -65.6594 |
65
+ | 0.6485 | 0.3999 | 670 | 0.6171 | -0.9421 | -1.1796 | 0.6678 | 0.2375 | -84.5023 | -84.5023 | -59.5615 | -65.6594 |
66
+ | 0.6362 | 0.5998 | 1005 | 0.6095 | -1.1035 | -1.3785 | 0.6743 | 0.2750 | -87.7290 | -87.7290 | -59.5615 | -65.6594 |
67
+ | 0.6342 | 0.7998 | 1340 | 0.6063 | -1.2460 | -1.5415 | 0.6768 | 0.2955 | -90.5797 | -90.5797 | -59.5615 | -65.6594 |
68
+ | 0.6299 | 0.9997 | 1675 | 0.6073 | -1.2728 | -1.5670 | 0.6761 | 0.2942 | -91.1163 | -91.1163 | -59.5615 | -65.6594 |
69
 
70
 
71
  ### Framework versions