mNLP-project
/

distilgpt2-dpo_test_run

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.0786
-- Rewards/chosen: -0.1353
-- Rewards/rejected: -0.5974
-- Rewards/accuracies: 0.5959
-- Rewards/margins: 0.4621
-- Logps/rejected: -493.6547
-- Logps/chosen: -559.9373
-- Logits/rejected: -82.4215
-- Logits/chosen: -80.3884
 ## Model description
@@ -45,26 +45,31 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 4
-- eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| No log        | 1.0   | 289  | 1.0786          | -0.1353        | -0.5974          | 0.5959             | 0.4621          | -493.6547      | -559.9373    | -82.4215        | -80.3884      |
-| 0.7672        | 2.0   | 578  | 1.1977          | 1.1873         | 0.5208           | 0.5993             | 0.6665          | -482.4724      | -546.7113    | -89.5540        | -87.9300      |
-| 0.7672        | 3.0   | 867  | 1.4420          | 0.6108         | -0.0653          | 0.5788             | 0.6761          | -488.3335      | -552.4765    | -97.7897        | -96.8133      |
 ### Framework versions
 - Transformers 4.40.2
-- Pytorch 2.2.1+cu121
 - Datasets 2.19.1
 - Tokenizers 0.19.1

 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.9044
+- Rewards/chosen: 0.7444
+- Rewards/rejected: 0.2592
+- Rewards/accuracies: 0.5817
+- Rewards/margins: 0.4852
+- Logps/rejected: -429.5133
+- Logps/chosen: -506.8889
+- Logits/rejected: -50.2012
+- Logits/chosen: -45.4443
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 6
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.8683        | 1.0   | 1337 | 0.9044          | 0.7444         | 0.2592           | 0.5817             | 0.4852          | -429.5133      | -506.8889    | -50.2012        | -45.4443      |
+| 0.4795        | 2.0   | 2674 | 0.9425          | 0.1993         | -0.4639          | 0.5959             | 0.6632          | -436.7442      | -512.3394    | -54.4344        | -49.5827      |
+| 0.1485        | 3.0   | 4011 | 1.1159          | -2.0134        | -2.6798          | 0.5775             | 0.6664          | -458.9030      | -534.4666    | -70.3363        | -65.4014      |
+| 0.0378        | 4.0   | 5348 | 1.3151          | -3.6174        | -4.7588          | 0.5927             | 1.1415          | -479.6934      | -550.5060    | -70.8835        | -65.6636      |
+| 0.0127        | 5.0   | 6685 | 1.4381          | -4.8640        | -6.0585          | 0.5822             | 1.1945          | -492.6903      | -562.9730    | -70.3612        | -64.6966      |
+| 0.0006        | 6.0   | 8022 | 1.5074          | -5.3161        | -6.4742          | 0.5837             | 1.1581          | -496.8472      | -567.4940    | -70.7820        | -64.9708      |
 ### Framework versions
 - Transformers 4.40.2
+- Pytorch 2.1.0+cu118
 - Datasets 2.19.1
 - Tokenizers 0.19.1

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c3e3e37b688f8b43569b6da693718fb7217c6cd41ecf84b0db5e7ed507a40b65
 size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:4845218046a89bea86f750ff165e97f41596402f2b795bd78146d7ec38d055ee
 size 497774208