wzhouad commited on
Commit
886c1e8
1 Parent(s): d4e6c1f

Model save

Browse files
README.md CHANGED
@@ -13,23 +13,20 @@ model-index:
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sanqiang/wdpo/runs/2uwctjg2)
17
  # zephyr-7b-dpo-full
18
 
19
  This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0070
22
- - Rewards/chosen: -1.4073
23
- - Rewards/rejected: -1.6626
24
- - Rewards/accuracies: 0.6101
25
- - Rewards/margins: 0.2554
26
- - Logps/rejected: -316.9625
27
- - Logps/chosen: -284.9710
28
- - Logits/rejected: -2.3666
29
- - Logits/chosen: -2.3785
30
- - Debug/policy Weights: 0.0083
31
- - Debug/losses: 0.0052
32
- - Debug/raw Losses: 0.6403
33
 
34
  ## Model description
35
 
@@ -64,20 +61,20 @@ The following hyperparameters were used during training:
64
 
65
  ### Training results
66
 
67
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Debug/policy Weights | Debug/losses | Debug/raw Losses |
68
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------------------:|:------------:|:----------------:|
69
- | 0.0684 | 0.0796 | 100 | 0.0623 | -0.1396 | -0.1630 | 0.6035 | 0.0234 | -167.0039 | -158.2063 | -2.7060 | -2.7139 | 0.0900 | 0.0615 | 0.6827 |
70
- | 0.0188 | 0.1592 | 200 | 0.0181 | -0.6590 | -0.7926 | 0.6353 | 0.1336 | -229.9578 | -210.1448 | -2.6167 | -2.6260 | 0.0261 | 0.0170 | 0.6478 |
71
- | 0.0106 | 0.2388 | 300 | 0.0124 | -0.8848 | -1.0088 | 0.6231 | 0.1239 | -251.5774 | -232.7285 | -2.5326 | -2.5410 | 0.0182 | 0.0117 | 0.6504 |
72
- | 0.0113 | 0.3183 | 400 | 0.0107 | -1.1250 | -1.3221 | 0.6259 | 0.1971 | -282.9042 | -256.7430 | -2.5431 | -2.5541 | 0.0146 | 0.0094 | 0.6486 |
73
- | 0.0049 | 0.3979 | 500 | 0.0052 | -1.5559 | -1.7544 | 0.5830 | 0.1985 | -326.1408 | -299.8377 | -2.5389 | -2.5502 | 0.0070 | 0.0046 | 0.6677 |
74
- | 0.0057 | 0.4775 | 600 | 0.0074 | -1.3034 | -1.5082 | 0.6138 | 0.2048 | -301.5209 | -274.5812 | -2.5458 | -2.5559 | 0.0100 | 0.0064 | 0.6465 |
75
- | 0.0088 | 0.5571 | 700 | 0.0103 | -1.1945 | -1.4133 | 0.6213 | 0.2188 | -292.0290 | -263.6917 | -2.5181 | -2.5285 | 0.0130 | 0.0083 | 0.6415 |
76
- | 0.0045 | 0.6367 | 800 | 0.0048 | -1.5892 | -1.8227 | 0.6054 | 0.2336 | -332.9696 | -303.1591 | -2.3814 | -2.3916 | 0.0058 | 0.0037 | 0.6507 |
77
- | 0.0058 | 0.7163 | 900 | 0.0066 | -1.4189 | -1.6455 | 0.6054 | 0.2266 | -315.2442 | -286.1336 | -2.3435 | -2.3544 | 0.0083 | 0.0052 | 0.6436 |
78
- | 0.006 | 0.7959 | 1000 | 0.0062 | -1.4586 | -1.6997 | 0.6091 | 0.2411 | -320.6679 | -290.1025 | -2.3587 | -2.3701 | 0.0075 | 0.0047 | 0.6449 |
79
- | 0.0058 | 0.8754 | 1100 | 0.0070 | -1.3982 | -1.6486 | 0.6063 | 0.2504 | -315.5557 | -284.0606 | -2.3679 | -2.3796 | 0.0084 | 0.0052 | 0.6403 |
80
- | 0.0064 | 0.9550 | 1200 | 0.0070 | -1.4073 | -1.6626 | 0.6101 | 0.2554 | -316.9625 | -284.9710 | -2.3666 | -2.3785 | 0.0083 | 0.0052 | 0.6403 |
81
 
82
 
83
  ### Framework versions
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sanqiang/wdpo/runs/dnn9mazg)
17
  # zephyr-7b-dpo-full
18
 
19
  This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.5417
22
+ - Rewards/chosen: -2.1562
23
+ - Rewards/rejected: -2.8807
24
+ - Rewards/accuracies: 0.7313
25
+ - Rewards/margins: 0.7245
26
+ - Logps/rejected: -438.7701
27
+ - Logps/chosen: -359.8675
28
+ - Logits/rejected: 0.5902
29
+ - Logits/chosen: 0.3561
 
 
 
30
 
31
  ## Model description
32
 
 
61
 
62
  ### Training results
63
 
64
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
+ | 0.679 | 0.0796 | 100 | 0.6759 | -0.1436 | -0.1818 | 0.5998 | 0.0382 | -168.8750 | -158.6036 | -2.6862 | -2.6943 |
67
+ | 0.5947 | 0.1592 | 200 | 0.6027 | -1.5133 | -2.0123 | 0.6679 | 0.4990 | -351.9330 | -295.5727 | -1.6083 | -1.6620 |
68
+ | 0.578 | 0.2388 | 300 | 0.5751 | -1.2683 | -1.7143 | 0.6894 | 0.4460 | -322.1284 | -271.0768 | -1.3925 | -1.5128 |
69
+ | 0.5575 | 0.3183 | 400 | 0.5613 | -1.7874 | -2.4481 | 0.7052 | 0.6607 | -395.5074 | -322.9848 | -0.2511 | -0.4263 |
70
+ | 0.5311 | 0.3979 | 500 | 0.5601 | -2.0743 | -2.7782 | 0.7248 | 0.7039 | -428.5196 | -351.6741 | 0.1321 | -0.1444 |
71
+ | 0.5658 | 0.4775 | 600 | 0.5562 | -1.9576 | -2.6629 | 0.7192 | 0.7053 | -416.9899 | -340.0069 | 0.9125 | 0.6661 |
72
+ | 0.556 | 0.5571 | 700 | 0.5502 | -2.1146 | -2.7825 | 0.7201 | 0.6678 | -428.9443 | -355.7084 | 0.9969 | 0.7302 |
73
+ | 0.5285 | 0.6367 | 800 | 0.5477 | -2.1980 | -2.9456 | 0.7229 | 0.7476 | -445.2567 | -364.0405 | 0.8564 | 0.6029 |
74
+ | 0.5299 | 0.7163 | 900 | 0.5450 | -2.1121 | -2.8512 | 0.7341 | 0.7391 | -435.8159 | -355.4508 | 0.9832 | 0.7089 |
75
+ | 0.5629 | 0.7959 | 1000 | 0.5440 | -2.1483 | -2.8941 | 0.7323 | 0.7457 | -440.1051 | -359.0749 | 0.7033 | 0.4600 |
76
+ | 0.5351 | 0.8754 | 1100 | 0.5423 | -2.1496 | -2.8571 | 0.7304 | 0.7074 | -436.4062 | -359.2066 | 0.5029 | 0.2753 |
77
+ | 0.5499 | 0.9550 | 1200 | 0.5417 | -2.1562 | -2.8807 | 0.7313 | 0.7245 | -438.7701 | -359.8675 | 0.5902 | 0.3561 |
78
 
79
 
80
  ### Framework versions
all_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 0.9996020692399522,
3
  "total_flos": 0.0,
4
- "train_loss": 0.016716306088907514,
5
- "train_runtime": 10019.6903,
6
  "train_samples": 160800,
7
- "train_samples_per_second": 16.048,
8
  "train_steps_per_second": 0.125
9
  }
 
1
  {
2
  "epoch": 0.9996020692399522,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.56636344817034,
5
+ "train_runtime": 10031.2749,
6
  "train_samples": 160800,
7
+ "train_samples_per_second": 16.03,
8
  "train_steps_per_second": 0.125
9
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2665dd7ae6b8845472d555a3ac411c024c47379eb8ab0ec3b01ed462262c5cae
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98f00987598688fdebf9936701ec965959200df36d355e58d759228a95bd1106
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d79d69148dbbfc6544ef08399154439f0d93dbb9c57ce3f0dd468ecee3d39edc
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69882d48888219846a5788fd8b94d1e6391d766b051bcaf33889cd3e7e8ce63f
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:34f814d8fb6482f27437cc5bb3ed6eb467b95d7d34b0af08c5a5d52feead3565
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88597e6e627267d3070da8b8d6010bbdf8fdee4bfd6d7c44ef9daa98a75f8dc9
3
  size 4540516344
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 0.9996020692399522,
3
  "total_flos": 0.0,
4
- "train_loss": 0.016716306088907514,
5
- "train_runtime": 10019.6903,
6
  "train_samples": 160800,
7
- "train_samples_per_second": 16.048,
8
  "train_steps_per_second": 0.125
9
  }
 
1
  {
2
  "epoch": 0.9996020692399522,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.56636344817034,
5
+ "train_runtime": 10031.2749,
6
  "train_samples": 160800,
7
+ "train_samples_per_second": 16.03,
8
  "train_steps_per_second": 0.125
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff