Luca-Engel commited on
Commit
9644871
1 Parent(s): b54102a

do test run on scitas with ref_model

Browse files
Files changed (2) hide show
  1. README.md +27 -27
  2. model.safetensors +1 -1
README.md CHANGED
@@ -6,26 +6,26 @@ tags:
6
  - dpo
7
  - generated_from_trainer
8
  model-index:
9
- - name: gpt2-dpo-with-cosine-lr-scheduler
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # gpt2-dpo-with-cosine-lr-scheduler
17
 
18
  This model is a fine-tuned version of [mNLP-project/gpt2-finetuned](https://huggingface.co/mNLP-project/gpt2-finetuned) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.1168
21
- - Rewards/chosen: 3.8849
22
- - Rewards/rejected: 3.2031
23
- - Rewards/accuracies: 0.5892
24
- - Rewards/margins: 0.6818
25
- - Logps/rejected: -761.2470
26
- - Logps/chosen: -910.5992
27
- - Logits/rejected: -36.5651
28
- - Logits/chosen: -30.3810
29
 
30
  ## Model description
31
 
@@ -44,31 +44,31 @@ More information needed
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
- - learning_rate: 1e-05
48
  - train_batch_size: 8
49
  - eval_batch_size: 8
50
  - seed: 42
51
- - gradient_accumulation_steps: 2
52
- - total_train_batch_size: 16
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: cosine
55
- - lr_scheduler_warmup_ratio: 0.1
56
  - num_epochs: 10
57
 
58
  ### Training results
59
 
60
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
- |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.9846 | 1.0 | 1337 | 1.1168 | 3.8849 | 3.2031 | 0.5892 | 0.6818 | -761.2470 | -910.5992 | -36.5651 | -30.3810 |
63
- | 0.6025 | 2.0 | 2674 | 1.1405 | 5.0060 | 4.0992 | 0.6175 | 0.9068 | -752.2864 | -899.3887 | -35.0528 | -28.9839 |
64
- | 0.2464 | 3.0 | 4011 | 1.1202 | 4.6754 | 3.6835 | 0.6160 | 0.9919 | -756.4427 | -902.6943 | -39.6513 | -33.3219 |
65
- | 0.1182 | 4.0 | 5348 | 1.3054 | 7.3114 | 5.8367 | 0.6131 | 1.4747 | -734.9108 | -876.3349 | -35.1974 | -28.6005 |
66
- | 0.0669 | 5.0 | 6685 | 1.3846 | 6.5378 | 5.0738 | 0.6093 | 1.4640 | -742.5399 | -884.0710 | -39.0355 | -31.8814 |
67
- | 0.0226 | 6.0 | 8022 | 1.4662 | 6.2901 | 4.6812 | 0.6052 | 1.6089 | -746.4659 | -886.5475 | -40.3811 | -32.9593 |
68
- | 0.0128 | 7.0 | 9359 | 1.5557 | 5.8081 | 4.1554 | 0.6108 | 1.6527 | -751.7241 | -891.3676 | -39.1744 | -31.2704 |
69
- | 0.019 | 8.0 | 10696 | 1.6676 | 5.5428 | 3.8458 | 0.6011 | 1.6970 | -754.8205 | -894.0207 | -40.5161 | -32.4700 |
70
- | 0.0101 | 9.0 | 12033 | 1.7100 | 5.5531 | 3.8215 | 0.6022 | 1.7315 | -755.0627 | -893.9178 | -40.7171 | -32.5929 |
71
- | 0.0053 | 10.0 | 13370 | 1.7177 | 5.4221 | 3.7030 | 0.6000 | 1.7191 | -756.2481 | -895.2274 | -40.8064 | -32.6689 |
72
 
73
 
74
  ### Framework versions
 
6
  - dpo
7
  - generated_from_trainer
8
  model-index:
9
+ - name: gpt2-dpo
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ # gpt2-dpo
17
 
18
  This model is a fine-tuned version of [mNLP-project/gpt2-finetuned](https://huggingface.co/mNLP-project/gpt2-finetuned) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.6350
21
+ - Rewards/chosen: 1.6222
22
+ - Rewards/rejected: 1.3204
23
+ - Rewards/accuracies: 0.6496
24
+ - Rewards/margins: 0.3018
25
+ - Logps/rejected: -780.0735
26
+ - Logps/chosen: -933.2262
27
+ - Logits/rejected: -34.5449
28
+ - Logits/chosen: -28.7838
29
 
30
  ## Model description
31
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
+ - learning_rate: 1e-06
48
  - train_batch_size: 8
49
  - eval_batch_size: 8
50
  - seed: 42
51
+ - gradient_accumulation_steps: 4
52
+ - total_train_batch_size: 32
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: cosine
55
+ - lr_scheduler_warmup_ratio: 0.2
56
  - num_epochs: 10
57
 
58
  ### Training results
59
 
60
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.6286 | 0.9993 | 668 | 0.6350 | 1.6222 | 1.3204 | 0.6496 | 0.3018 | -780.0735 | -933.2262 | -34.5449 | -28.7838 |
63
+ | 0.6387 | 2.0 | 1337 | 0.6662 | 1.8546 | 1.5416 | 0.6302 | 0.3130 | -777.8622 | -930.9024 | -34.5110 | -28.7424 |
64
+ | 0.5643 | 2.9993 | 2005 | 0.6635 | 2.0534 | 1.6918 | 0.6396 | 0.3616 | -776.3599 | -928.9147 | -34.5066 | -28.7168 |
65
+ | 0.4487 | 4.0 | 2674 | 0.6677 | 2.2748 | 1.8809 | 0.6451 | 0.3940 | -774.4694 | -926.7002 | -34.1409 | -28.2530 |
66
+ | 0.3831 | 4.9993 | 3342 | 0.6783 | 2.4765 | 2.0527 | 0.6418 | 0.4238 | -772.7513 | -924.6838 | -34.0051 | -28.0668 |
67
+ | 0.352 | 6.0 | 4011 | 0.6782 | 2.4441 | 2.0097 | 0.6440 | 0.4344 | -773.1808 | -925.0074 | -34.0868 | -28.1418 |
68
+ | 0.3189 | 6.9993 | 4679 | 0.6840 | 2.2310 | 1.8303 | 0.6343 | 0.4008 | -774.9752 | -927.1384 | -33.9525 | -27.9466 |
69
+ | 0.3006 | 8.0 | 5348 | 0.6882 | 2.4339 | 1.9918 | 0.6388 | 0.4422 | -773.3604 | -925.1093 | -33.7716 | -27.7551 |
70
+ | 0.3152 | 8.9993 | 6016 | 0.6891 | 2.4920 | 2.0457 | 0.6407 | 0.4462 | -772.8206 | -924.5289 | -33.6753 | -27.6463 |
71
+ | 0.2752 | 9.9925 | 6680 | 0.6892 | 2.4562 | 2.0151 | 0.6410 | 0.4411 | -773.1274 | -924.8871 | -33.6818 | -27.6538 |
72
 
73
 
74
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0fcc1d412b4c4fe059e1016a6a26cf7f31cc6a0b7cf1cf364d64184002e77bc3
3
  size 497774208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ef27d8cae5a67ccd7d8d3e0727dab7ae405ec179914db0c9054695a90af4a78
3
  size 497774208