lewtun HF staff commited on
Commit
b227adb
1 Parent(s): 3d5a485

Model save

Browse files
README.md CHANGED
@@ -2,8 +2,9 @@
2
  license: apache-2.0
3
  base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
 
 
5
  - generated_from_trainer
6
- - alignment-handbook
7
  model-index:
8
  - name: zephyr-7b-dpo-full
9
  results: []
@@ -14,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # zephyr-7b-dpo-full
16
 
17
- This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.6910
20
- - Rewards/chosen: -3.9218
21
- - Rewards/rejected: -8.2942
22
- - Rewards/accuracies: 0.8125
23
- - Rewards/margins: 4.3724
24
- - Logps/rejected: -279.5480
25
- - Logps/chosen: -293.9998
26
- - Logits/rejected: -2.6725
27
- - Logits/chosen: -2.7826
28
 
29
  ## Model description
30
 
@@ -44,56 +45,32 @@ More information needed
44
 
45
  The following hyperparameters were used during training:
46
  - learning_rate: 5e-07
47
- - train_batch_size: 2
48
- - eval_batch_size: 4
49
  - seed: 42
50
  - distributed_type: multi-GPU
51
- - num_devices: 32
52
- - total_train_batch_size: 64
53
- - total_eval_batch_size: 128
 
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
- - lr_scheduler_type: linear
56
  - lr_scheduler_warmup_ratio: 0.1
57
- - num_epochs: 3
58
 
59
  ### Training results
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
- | 0.5386 | 0.1 | 100 | 0.5208 | 0.0564 | -0.7521 | 0.7188 | 0.8085 | -204.1269 | -254.2179 | -3.0136 | -3.0550 |
64
- | 0.4931 | 0.21 | 200 | 0.4882 | -0.0132 | -1.2683 | 0.7812 | 1.2551 | -209.2889 | -254.9136 | -3.1056 | -3.1407 |
65
- | 0.479 | 0.31 | 300 | 0.5038 | -0.1035 | -1.4012 | 0.7812 | 1.2978 | -210.6186 | -255.8163 | -3.0809 | -3.1328 |
66
- | 0.5052 | 0.41 | 400 | 0.5154 | -0.1923 | -1.8783 | 0.7969 | 1.6860 | -215.3891 | -256.7043 | -2.9104 | -2.9644 |
67
- | 0.4513 | 0.52 | 500 | 0.4979 | 0.0207 | -1.6562 | 0.7969 | 1.6769 | -213.1682 | -254.5742 | -3.0061 | -3.0657 |
68
- | 0.4905 | 0.62 | 600 | 0.4907 | -0.0944 | -1.5847 | 0.7656 | 1.4903 | -212.4527 | -255.7256 | -2.9374 | -3.0170 |
69
- | 0.5609 | 0.72 | 700 | 0.4928 | -0.4249 | -1.7238 | 0.7656 | 1.2989 | -213.8441 | -259.0304 | -2.9475 | -3.0128 |
70
- | 0.5338 | 0.83 | 800 | 0.4767 | -0.1567 | -1.9114 | 0.8125 | 1.7547 | -215.7200 | -256.3484 | -2.8455 | -2.9183 |
71
- | 0.5039 | 0.93 | 900 | 0.4854 | -0.0886 | -1.6900 | 0.75 | 1.6014 | -213.5057 | -255.6674 | -2.8295 | -2.9093 |
72
- | 0.0776 | 1.03 | 1000 | 0.4938 | -0.4848 | -2.5287 | 0.7656 | 2.0438 | -221.8927 | -259.6300 | -2.7580 | -2.8437 |
73
- | 0.0901 | 1.14 | 1100 | 0.5071 | -1.0800 | -3.2419 | 0.7812 | 2.1619 | -229.0247 | -265.5817 | -2.8036 | -2.8858 |
74
- | 0.0828 | 1.24 | 1200 | 0.5159 | -0.9682 | -3.4087 | 0.7812 | 2.4406 | -230.6935 | -264.4635 | -2.7961 | -2.8708 |
75
- | 0.0916 | 1.34 | 1300 | 0.5222 | -1.0832 | -3.5535 | 0.7969 | 2.4703 | -232.1411 | -265.6135 | -2.8019 | -2.8754 |
76
- | 0.0965 | 1.44 | 1400 | 0.5204 | -1.1951 | -3.5681 | 0.7969 | 2.3731 | -232.2874 | -266.7324 | -2.8058 | -2.8884 |
77
- | 0.0716 | 1.55 | 1500 | 0.5381 | -1.6588 | -4.0838 | 0.7188 | 2.4250 | -237.4441 | -271.3697 | -2.7979 | -2.8862 |
78
- | 0.0957 | 1.65 | 1600 | 0.5151 | -1.1746 | -3.7477 | 0.75 | 2.5731 | -234.0834 | -266.5278 | -2.7960 | -2.8976 |
79
- | 0.0645 | 1.75 | 1700 | 0.5393 | -1.7591 | -4.6011 | 0.8125 | 2.8419 | -242.6167 | -272.3728 | -2.7483 | -2.8592 |
80
- | 0.0838 | 1.86 | 1800 | 0.5385 | -1.6606 | -4.4648 | 0.7656 | 2.8042 | -241.2545 | -271.3875 | -2.7311 | -2.8383 |
81
- | 0.1106 | 1.96 | 1900 | 0.5322 | -1.5621 | -3.9779 | 0.7969 | 2.4158 | -236.3850 | -270.4025 | -2.8194 | -2.9133 |
82
- | 0.0174 | 2.06 | 2000 | 0.5921 | -2.4968 | -5.9514 | 0.7969 | 3.4546 | -256.1199 | -279.7498 | -2.7579 | -2.8631 |
83
- | 0.0134 | 2.17 | 2100 | 0.6247 | -2.9002 | -6.4277 | 0.7969 | 3.5275 | -260.8829 | -283.7838 | -2.7316 | -2.8319 |
84
- | 0.0148 | 2.27 | 2200 | 0.6402 | -3.2520 | -7.0627 | 0.7812 | 3.8106 | -267.2330 | -287.3020 | -2.6991 | -2.8064 |
85
- | 0.0142 | 2.37 | 2300 | 0.6563 | -3.2715 | -7.1303 | 0.8281 | 3.8588 | -267.9088 | -287.4962 | -2.6871 | -2.7992 |
86
- | 0.011 | 2.48 | 2400 | 0.6605 | -3.2996 | -7.2258 | 0.7969 | 3.9262 | -268.8643 | -287.7776 | -2.6555 | -2.7717 |
87
- | 0.0065 | 2.58 | 2500 | 0.6935 | -3.6399 | -8.0232 | 0.8125 | 4.3832 | -276.8377 | -291.1808 | -2.6780 | -2.7902 |
88
- | 0.0089 | 2.68 | 2600 | 0.6773 | -3.4822 | -7.8182 | 0.8125 | 4.3360 | -274.7881 | -289.6033 | -2.6885 | -2.7994 |
89
- | 0.0102 | 2.79 | 2700 | 0.6813 | -3.5909 | -7.8097 | 0.8281 | 4.2187 | -274.7028 | -290.6908 | -2.6877 | -2.7970 |
90
- | 0.0136 | 2.89 | 2800 | 0.6892 | -3.8236 | -8.1490 | 0.8125 | 4.3254 | -278.0957 | -293.0175 | -2.6765 | -2.7862 |
91
- | 0.0091 | 2.99 | 2900 | 0.6913 | -3.9199 | -8.3004 | 0.8125 | 4.3806 | -279.6104 | -293.9802 | -2.6728 | -2.7830 |
92
 
93
 
94
  ### Framework versions
95
 
96
- - Transformers 4.35.0
97
- - Pytorch 2.1.0+cu118
98
  - Datasets 2.14.6
99
- - Tokenizers 0.14.1
 
2
  license: apache-2.0
3
  base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
5
+ - trl
6
+ - dpo
7
  - generated_from_trainer
 
8
  model-index:
9
  - name: zephyr-7b-dpo-full
10
  results: []
 
15
 
16
  # zephyr-7b-dpo-full
17
 
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.5028
21
+ - Rewards/chosen: -0.9469
22
+ - Rewards/rejected: -1.8932
23
+ - Rewards/accuracies: 0.7656
24
+ - Rewards/margins: 0.9463
25
+ - Logps/rejected: -451.4661
26
+ - Logps/chosen: -357.2325
27
+ - Logits/rejected: 1.5731
28
+ - Logits/chosen: 0.6530
29
 
30
  ## Model description
31
 
 
45
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 5e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
  - seed: 42
51
  - distributed_type: multi-GPU
52
+ - num_devices: 8
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 128
55
+ - total_eval_batch_size: 64
56
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
  - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 1
60
 
61
  ### Training results
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.5545 | 0.21 | 100 | 0.5658 | -0.4953 | -1.1217 | 0.7188 | 0.6264 | -374.3159 | -312.0799 | -1.0287 | -1.3212 |
66
+ | 0.5026 | 0.42 | 200 | 0.5202 | -0.8995 | -1.7718 | 0.7461 | 0.8723 | -439.3264 | -352.4985 | 0.5190 | -0.1773 |
67
+ | 0.5106 | 0.63 | 300 | 0.5104 | -0.7946 | -1.6285 | 0.7656 | 0.8339 | -424.9976 | -342.0043 | 0.9099 | 0.0862 |
68
+ | 0.4859 | 0.84 | 400 | 0.5031 | -0.9777 | -1.9580 | 0.7578 | 0.9803 | -457.9452 | -360.3139 | 1.7438 | 0.7818 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
 
71
  ### Framework versions
72
 
73
+ - Transformers 4.36.2
74
+ - Pytorch 2.1.2+cu121
75
  - Datasets 2.14.6
76
+ - Tokenizers 0.15.0
all_results.json CHANGED
@@ -1,21 +1,21 @@
1
  {
2
- "epoch": 3.0,
3
- "eval_logits/chosen": -2.7826476097106934,
4
- "eval_logits/rejected": -2.672537326812744,
5
- "eval_logps/chosen": -293.99981689453125,
6
- "eval_logps/rejected": -279.5479736328125,
7
- "eval_loss": 0.6910352110862732,
8
- "eval_rewards/accuracies": 0.8125,
9
- "eval_rewards/chosen": -3.9218149185180664,
10
- "eval_rewards/margins": 4.372367858886719,
11
- "eval_rewards/rejected": -8.294181823730469,
12
- "eval_runtime": 43.6649,
13
  "eval_samples": 2000,
14
- "eval_samples_per_second": 45.803,
15
- "eval_steps_per_second": 0.366,
16
- "train_loss": 0.20427082364947516,
17
- "train_runtime": 9903.6907,
18
- "train_samples": 61966,
19
- "train_samples_per_second": 18.771,
20
- "train_steps_per_second": 0.294
21
  }
 
1
  {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": 0.6529867053031921,
4
+ "eval_logits/rejected": 1.5730761289596558,
5
+ "eval_logps/chosen": -357.2324523925781,
6
+ "eval_logps/rejected": -451.466064453125,
7
+ "eval_loss": 0.5028161406517029,
8
+ "eval_rewards/accuracies": 0.765625,
9
+ "eval_rewards/chosen": -0.9468507170677185,
10
+ "eval_rewards/margins": 0.946345865726471,
11
+ "eval_rewards/rejected": -1.8931965827941895,
12
+ "eval_runtime": 89.0083,
13
  "eval_samples": 2000,
14
+ "eval_samples_per_second": 22.47,
15
+ "eval_steps_per_second": 0.36,
16
+ "train_loss": 0.5366686437918052,
17
+ "train_runtime": 5328.4749,
18
+ "train_samples": 61135,
19
+ "train_samples_per_second": 11.473,
20
+ "train_steps_per_second": 0.09
21
  }
eval_results.json CHANGED
@@ -1,16 +1,16 @@
1
  {
2
- "epoch": 3.0,
3
- "eval_logits/chosen": -2.7826476097106934,
4
- "eval_logits/rejected": -2.672537326812744,
5
- "eval_logps/chosen": -293.99981689453125,
6
- "eval_logps/rejected": -279.5479736328125,
7
- "eval_loss": 0.6910352110862732,
8
- "eval_rewards/accuracies": 0.8125,
9
- "eval_rewards/chosen": -3.9218149185180664,
10
- "eval_rewards/margins": 4.372367858886719,
11
- "eval_rewards/rejected": -8.294181823730469,
12
- "eval_runtime": 43.6649,
13
  "eval_samples": 2000,
14
- "eval_samples_per_second": 45.803,
15
- "eval_steps_per_second": 0.366
16
  }
 
1
  {
2
+ "epoch": 1.0,
3
+ "eval_logits/chosen": 0.6529867053031921,
4
+ "eval_logits/rejected": 1.5730761289596558,
5
+ "eval_logps/chosen": -357.2324523925781,
6
+ "eval_logps/rejected": -451.466064453125,
7
+ "eval_loss": 0.5028161406517029,
8
+ "eval_rewards/accuracies": 0.765625,
9
+ "eval_rewards/chosen": -0.9468507170677185,
10
+ "eval_rewards/margins": 0.946345865726471,
11
+ "eval_rewards/rejected": -1.8931965827941895,
12
+ "eval_runtime": 89.0083,
13
  "eval_samples": 2000,
14
+ "eval_samples_per_second": 22.47,
15
+ "eval_steps_per_second": 0.36
16
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
- "transformers_version": "4.35.0"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
+ "transformers_version": "4.36.2"
6
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b5a59a785d013e88920469ff94d9cc4597b5f20ae7c37c3f9999286b2657de90
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17698e797fee795be264ff6fceb1ce5de5e2d3504ccebb6cc762eccf36863396
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:006ee5be05747b1ce4366ea7a71175b304335e5440afbb05bb4540bce230c589
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d57e247aa8357fee9c5771c9956ae96eea45624a113b40fdba0fcd2b848476f7
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ce82904367f3f9ffa962b8c61463ca6cf3ffaa2ba8ac8b826475ed12028ff19c
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:daa924d9453a024a65583a098420d94aa3946268cadc893d3a672cde300d7731
3
  size 4540516344
runs/Jan09_01-04-39_ip-26-0-175-170/events.out.tfevents.1704762517.ip-26-0-175-170.1764083.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f380d10647058fdc3e68b19dcb932353c780ce866d97f3fe20916b24fdea7813
3
- size 33302
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43fd30e36649e5afb9b9fdfbac2b25e37172a20c8c5439559b65ce8a116ad4b6
3
+ size 38094
runs/Jan09_01-04-39_ip-26-0-175-170/events.out.tfevents.1704767935.ip-26-0-175-170.1764083.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a98edf2a88f6dbeceef003519ababfd42ca5f31785ed47c0bd51d31d863ae5a
3
+ size 828
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 3.0,
3
- "train_loss": 0.20427082364947516,
4
- "train_runtime": 9903.6907,
5
- "train_samples": 61966,
6
- "train_samples_per_second": 18.771,
7
- "train_steps_per_second": 0.294
8
  }
 
1
  {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.5366686437918052,
4
+ "train_runtime": 5328.4749,
5
+ "train_samples": 61135,
6
+ "train_samples_per_second": 11.473,
7
+ "train_steps_per_second": 0.09
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff