Model save
Browse files- README.md +20 -45
- all_results.json +5 -18
- model-00001-of-00003.safetensors +1 -1
- model-00002-of-00003.safetensors +1 -1
- model-00003-of-00003.safetensors +1 -1
- runs/May20_07-04-40_ubuntu/events.out.tfevents.1716189133.ubuntu.3046003.0 +2 -2
- train_results.json +5 -5
- trainer_state.json +0 -0
README.md
CHANGED
@@ -2,15 +2,9 @@
|
|
2 |
license: apache-2.0
|
3 |
base_model: alignment-handbook/zephyr-7b-sft-full
|
4 |
tags:
|
5 |
-
- alignment-handbook
|
6 |
- trl
|
7 |
- dpo
|
8 |
- generated_from_trainer
|
9 |
-
- trl
|
10 |
-
- dpo
|
11 |
-
- generated_from_trainer
|
12 |
-
datasets:
|
13 |
-
- HuggingFaceH4/ultrafeedback_binarized
|
14 |
model-index:
|
15 |
- name: zephyr-7b-dpo-full
|
16 |
results: []
|
@@ -21,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
|
|
21 |
|
22 |
# zephyr-7b-dpo-full
|
23 |
|
24 |
-
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on
|
25 |
It achieves the following results on the evaluation set:
|
26 |
-
- Loss: 0.
|
27 |
-
- Rewards/chosen: -
|
28 |
-
- Rewards/rejected: -
|
29 |
-
- Rewards/accuracies: 0.
|
30 |
-
- Rewards/margins:
|
31 |
-
- Logps/rejected: -
|
32 |
-
- Logps/chosen: -
|
33 |
-
- Logits/rejected: -
|
34 |
-
- Logits/chosen: -
|
35 |
|
36 |
## Model description
|
37 |
|
@@ -62,40 +56,21 @@ The following hyperparameters were used during training:
|
|
62 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
63 |
- lr_scheduler_type: cosine
|
64 |
- lr_scheduler_warmup_ratio: 0.1
|
65 |
-
- num_epochs:
|
66 |
|
67 |
### Training results
|
68 |
|
69 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
70 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
71 |
-
| 0.
|
72 |
-
| 0.
|
73 |
-
| 0.
|
74 |
-
| 0.
|
75 |
-
| 0.
|
76 |
-
| 0.
|
77 |
-
| 0.
|
78 |
-
| 0.
|
79 |
-
| 0.
|
80 |
-
| 0.1144 | 1.0466 | 1000 | 0.5714 | -0.2947 | -1.8774 | 0.7639 | 1.5826 | -278.9927 | -284.9014 | -2.5495 | -2.6080 |
|
81 |
-
| 0.087 | 1.1512 | 1100 | 0.5960 | -0.6932 | -2.6301 | 0.7837 | 1.9369 | -286.5200 | -288.8864 | -2.5036 | -2.5699 |
|
82 |
-
| 0.1122 | 1.2559 | 1200 | 0.6133 | -1.5655 | -3.6620 | 0.7540 | 2.0965 | -296.8384 | -297.6089 | -2.4063 | -2.4765 |
|
83 |
-
| 0.1303 | 1.3605 | 1300 | 0.6040 | -1.7575 | -3.6828 | 0.7837 | 1.9252 | -297.0464 | -299.5291 | -2.3747 | -2.4470 |
|
84 |
-
| 0.0884 | 1.4652 | 1400 | 0.6035 | -1.4203 | -3.2606 | 0.7798 | 1.8403 | -292.8251 | -296.1571 | -2.3840 | -2.4553 |
|
85 |
-
| 0.0807 | 1.5699 | 1500 | 0.6033 | -1.8277 | -3.9141 | 0.7877 | 2.0864 | -299.3599 | -300.2314 | -2.3962 | -2.4731 |
|
86 |
-
| 0.1027 | 1.6745 | 1600 | 0.6157 | -1.3414 | -3.3683 | 0.7857 | 2.0269 | -293.9024 | -295.3680 | -2.3746 | -2.4536 |
|
87 |
-
| 0.0989 | 1.7792 | 1700 | 0.6009 | -1.4146 | -3.5889 | 0.7917 | 2.1744 | -296.1083 | -296.0996 | -2.3750 | -2.4548 |
|
88 |
-
| 0.0945 | 1.8838 | 1800 | 0.6109 | -1.1285 | -3.3269 | 0.7877 | 2.1984 | -293.4879 | -293.2390 | -2.4051 | -2.4825 |
|
89 |
-
| 0.0789 | 1.9885 | 1900 | 0.6093 | -1.9115 | -4.0587 | 0.7837 | 2.1472 | -300.8062 | -301.0694 | -2.3968 | -2.4730 |
|
90 |
-
| 0.0086 | 2.0931 | 2000 | 0.7414 | -2.9121 | -5.9384 | 0.7758 | 3.0263 | -319.6029 | -311.0746 | -2.2016 | -2.2928 |
|
91 |
-
| 0.0137 | 2.1978 | 2100 | 0.8116 | -4.6780 | -8.1860 | 0.7679 | 3.5080 | -342.0789 | -328.7336 | -1.8924 | -2.0338 |
|
92 |
-
| 0.0152 | 2.3025 | 2200 | 0.8371 | -5.0993 | -8.7589 | 0.7679 | 3.6596 | -347.8080 | -332.9471 | -1.8207 | -1.9887 |
|
93 |
-
| 0.0062 | 2.4071 | 2300 | 0.8704 | -6.2532 | -10.1416 | 0.7679 | 3.8884 | -361.6346 | -344.4856 | -1.5897 | -1.8086 |
|
94 |
-
| 0.0124 | 2.5118 | 2400 | 0.8848 | -5.6604 | -9.6724 | 0.7698 | 4.0120 | -356.9429 | -338.5582 | -1.5561 | -1.7751 |
|
95 |
-
| 0.0078 | 2.6164 | 2500 | 0.8926 | -6.1681 | -10.2415 | 0.7679 | 4.0734 | -362.6336 | -343.6352 | -1.4181 | -1.6590 |
|
96 |
-
| 0.0083 | 2.7211 | 2600 | 0.9002 | -6.5323 | -10.6541 | 0.7659 | 4.1218 | -366.7602 | -347.2773 | -1.3929 | -1.6493 |
|
97 |
-
| 0.0115 | 2.8257 | 2700 | 0.9076 | -6.4271 | -10.6033 | 0.7639 | 4.1762 | -366.2516 | -346.2245 | -1.4047 | -1.6632 |
|
98 |
-
| 0.0134 | 2.9304 | 2800 | 0.9106 | -6.3982 | -10.5970 | 0.7639 | 4.1988 | -366.1889 | -345.9361 | -1.3900 | -1.6525 |
|
99 |
|
100 |
|
101 |
### Framework versions
|
|
|
2 |
license: apache-2.0
|
3 |
base_model: alignment-handbook/zephyr-7b-sft-full
|
4 |
tags:
|
|
|
5 |
- trl
|
6 |
- dpo
|
7 |
- generated_from_trainer
|
|
|
|
|
|
|
|
|
|
|
8 |
model-index:
|
9 |
- name: zephyr-7b-dpo-full
|
10 |
results: []
|
|
|
15 |
|
16 |
# zephyr-7b-dpo-full
|
17 |
|
18 |
+
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 0.5280
|
21 |
+
- Rewards/chosen: -0.0160
|
22 |
+
- Rewards/rejected: -1.2503
|
23 |
+
- Rewards/accuracies: 0.7798
|
24 |
+
- Rewards/margins: 1.2343
|
25 |
+
- Logps/rejected: -272.7223
|
26 |
+
- Logps/chosen: -282.1141
|
27 |
+
- Logits/rejected: -2.5362
|
28 |
+
- Logits/chosen: -2.5901
|
29 |
|
30 |
## Model description
|
31 |
|
|
|
56 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
57 |
- lr_scheduler_type: cosine
|
58 |
- lr_scheduler_warmup_ratio: 0.1
|
59 |
+
- num_epochs: 1
|
60 |
|
61 |
### Training results
|
62 |
|
63 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
64 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
65 |
+
| 0.5446 | 0.1047 | 100 | 0.5753 | 1.0111 | 0.3529 | 0.7242 | 0.6581 | -256.6898 | -271.8434 | -2.5161 | -2.5743 |
|
66 |
+
| 0.5475 | 0.2093 | 200 | 0.5464 | 0.4347 | -0.4824 | 0.7639 | 0.9172 | -265.0432 | -277.6068 | -2.5380 | -2.5923 |
|
67 |
+
| 0.5359 | 0.3140 | 300 | 0.5473 | 0.0697 | -1.0170 | 0.7579 | 1.0867 | -270.3889 | -281.2571 | -2.5066 | -2.5596 |
|
68 |
+
| 0.5228 | 0.4186 | 400 | 0.5321 | -0.2311 | -1.3065 | 0.7540 | 1.0754 | -273.2837 | -284.2652 | -2.5933 | -2.6471 |
|
69 |
+
| 0.5217 | 0.5233 | 500 | 0.5260 | 0.0143 | -1.2073 | 0.7877 | 1.2216 | -272.2919 | -281.8111 | -2.5195 | -2.5773 |
|
70 |
+
| 0.517 | 0.6279 | 600 | 0.5262 | -0.2922 | -1.4562 | 0.7698 | 1.1640 | -274.7808 | -284.8755 | -2.5183 | -2.5744 |
|
71 |
+
| 0.4766 | 0.7326 | 700 | 0.5279 | -0.0183 | -1.2936 | 0.7798 | 1.2753 | -273.1544 | -282.1366 | -2.5194 | -2.5751 |
|
72 |
+
| 0.4894 | 0.8373 | 800 | 0.5257 | -0.0567 | -1.2594 | 0.7778 | 1.2027 | -272.8127 | -282.5211 | -2.5311 | -2.5851 |
|
73 |
+
| 0.4722 | 0.9419 | 900 | 0.5280 | -0.0160 | -1.2503 | 0.7798 | 1.2343 | -272.7223 | -282.1141 | -2.5362 | -2.5901 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
|
76 |
### Framework versions
|
all_results.json
CHANGED
@@ -1,22 +1,9 @@
|
|
1 |
{
|
2 |
-
"epoch":
|
3 |
-
"eval_logits/chosen": -1.6524701118469238,
|
4 |
-
"eval_logits/rejected": -1.3897571563720703,
|
5 |
-
"eval_logps/chosen": -346.02056884765625,
|
6 |
-
"eval_logps/rejected": -366.2360534667969,
|
7 |
-
"eval_loss": 0.910855770111084,
|
8 |
-
"eval_rewards/accuracies": 0.7658730149269104,
|
9 |
-
"eval_rewards/chosen": -6.40665864944458,
|
10 |
-
"eval_rewards/margins": 4.195058822631836,
|
11 |
-
"eval_rewards/rejected": -10.601717948913574,
|
12 |
-
"eval_runtime": 196.377,
|
13 |
-
"eval_samples": 2000,
|
14 |
-
"eval_samples_per_second": 10.184,
|
15 |
-
"eval_steps_per_second": 0.321,
|
16 |
"total_flos": 0.0,
|
17 |
-
"train_loss": 0.
|
18 |
-
"train_runtime":
|
19 |
"train_samples": 61134,
|
20 |
-
"train_samples_per_second": 3.
|
21 |
-
"train_steps_per_second": 0.
|
22 |
}
|
|
|
1 |
{
|
2 |
+
"epoch": 0.9994767137624281,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.528570184657711,
|
5 |
+
"train_runtime": 17823.4182,
|
6 |
"train_samples": 61134,
|
7 |
+
"train_samples_per_second": 3.43,
|
8 |
+
"train_steps_per_second": 0.054
|
9 |
}
|
model-00001-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4943162336
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3b1fd55fd4d17cad855ab9d45ac20a8b8fdcfc540a35a4967bb5ae2a3bc1bba5
|
3 |
size 4943162336
|
model-00002-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999819336
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:535b15dbb05eedc817a81d7b3e9bfec99e0b560cc28d17893ba3bb2fef53083a
|
3 |
size 4999819336
|
model-00003-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4540516344
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b5d8e6c1f583c7a8841c68e9e269d659c35e1cbe70880bcd676a0b75972a71b4
|
3 |
size 4540516344
|
runs/May20_07-04-40_ubuntu/events.out.tfevents.1716189133.ubuntu.3046003.0
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3c08302d3ecc91ff9d00e18e6886d3398633d17143a5f5224a3e6d0559a28e59
|
3 |
+
size 77775
|
train_results.json
CHANGED
@@ -1,9 +1,9 @@
|
|
1 |
{
|
2 |
-
"epoch":
|
3 |
"total_flos": 0.0,
|
4 |
-
"train_loss": 0.
|
5 |
-
"train_runtime":
|
6 |
"train_samples": 61134,
|
7 |
-
"train_samples_per_second": 3.
|
8 |
-
"train_steps_per_second": 0.
|
9 |
}
|
|
|
1 |
{
|
2 |
+
"epoch": 0.9994767137624281,
|
3 |
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.528570184657711,
|
5 |
+
"train_runtime": 17823.4182,
|
6 |
"train_samples": 61134,
|
7 |
+
"train_samples_per_second": 3.43,
|
8 |
+
"train_steps_per_second": 0.054
|
9 |
}
|
trainer_state.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|