ironrock commited on
Commit
76557a4
1 Parent(s): c5ea2d8

Model save

Browse files
Files changed (2) hide show
  1. README.md +99 -0
  2. adapter_model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - trl
5
+ - dpo
6
+ - generated_from_trainer
7
+ base_model: Weni/ZeroShot-Agents-Llama3-4.0.11-SFT-merged
8
+ model-index:
9
+ - name: ZeroShot-Agents-Llama3-4.0.37-DPO
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # ZeroShot-Agents-Llama3-4.0.37-DPO
17
+
18
+ This model is a fine-tuned version of [Weni/ZeroShot-Agents-Llama3-4.0.11-SFT-merged](https://huggingface.co/Weni/ZeroShot-Agents-Llama3-4.0.11-SFT-merged) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.2625
21
+ - Rewards/chosen: 0.9675
22
+ - Rewards/rejected: -1.4892
23
+ - Rewards/accuracies: 0.9070
24
+ - Rewards/margins: 2.4567
25
+ - Logps/rejected: -54.9700
26
+ - Logps/chosen: -30.7303
27
+ - Logits/rejected: -0.1517
28
+ - Logits/chosen: -0.1554
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 6e-06
48
+ - train_batch_size: 2
49
+ - eval_batch_size: 2
50
+ - seed: 42
51
+ - gradient_accumulation_steps: 8
52
+ - total_train_batch_size: 16
53
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
+ - lr_scheduler_type: linear
55
+ - lr_scheduler_warmup_ratio: 0.03
56
+ - training_steps: 570
57
+ - mixed_precision_training: Native AMP
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.5549 | 0.2092 | 20 | 0.4323 | 1.3557 | 0.4942 | 0.8081 | 0.8615 | -35.1361 | -26.8486 | -0.1975 | -0.1979 |
64
+ | 0.3276 | 0.4183 | 40 | 0.3438 | 1.5786 | -0.8833 | 0.8895 | 2.4619 | -48.9107 | -24.6195 | -0.1595 | -0.1635 |
65
+ | 0.3448 | 0.6275 | 60 | 0.2700 | 1.3059 | -0.8860 | 0.9070 | 2.1918 | -48.9380 | -27.3469 | -0.1764 | -0.1794 |
66
+ | 0.2844 | 0.8366 | 80 | 0.2625 | 0.9675 | -1.4892 | 0.9070 | 2.4567 | -54.9700 | -30.7303 | -0.1517 | -0.1554 |
67
+ | 0.3134 | 1.0458 | 100 | 0.2354 | 1.4560 | -1.6614 | 0.9244 | 3.1174 | -56.6918 | -25.8455 | -0.1389 | -0.1437 |
68
+ | 0.2497 | 1.2549 | 120 | 0.2168 | 1.5733 | -1.4017 | 0.9186 | 2.9750 | -54.0951 | -24.6722 | -0.1432 | -0.1482 |
69
+ | 0.2442 | 1.4641 | 140 | 0.2654 | 2.6265 | 0.2007 | 0.9070 | 2.4258 | -38.0710 | -14.1407 | -0.1262 | -0.1305 |
70
+ | 0.2677 | 1.6732 | 160 | 0.1999 | 1.7590 | -2.0006 | 0.9244 | 3.7596 | -60.0844 | -22.8155 | -0.0957 | -0.1025 |
71
+ | 0.2598 | 1.8824 | 180 | 0.2034 | 2.1611 | -1.1528 | 0.9302 | 3.3139 | -51.6058 | -18.7945 | -0.1060 | -0.1121 |
72
+ | 0.165 | 2.0915 | 200 | 0.1710 | 1.5140 | -1.8672 | 0.9593 | 3.3812 | -58.7503 | -25.2654 | -0.0961 | -0.1023 |
73
+ | 0.266 | 2.3007 | 220 | 0.2139 | 2.4282 | -1.2490 | 0.9302 | 3.6772 | -52.5682 | -16.1238 | -0.0783 | -0.0857 |
74
+ | 0.2234 | 2.5098 | 240 | 0.1854 | 2.2003 | -1.1492 | 0.9419 | 3.3495 | -51.5703 | -18.4021 | -0.0816 | -0.0879 |
75
+ | 0.1878 | 2.7190 | 260 | 0.1542 | 1.1292 | -2.8812 | 0.9535 | 4.0104 | -68.8902 | -29.1139 | -0.0584 | -0.0656 |
76
+ | 0.1515 | 2.9281 | 280 | 0.2068 | 2.5962 | -1.2508 | 0.9419 | 3.8470 | -52.5865 | -14.4433 | -0.0552 | -0.0633 |
77
+ | 0.1259 | 3.1373 | 300 | 0.1597 | 2.6323 | -1.4713 | 0.9477 | 4.1035 | -54.7910 | -14.0830 | -0.0438 | -0.0527 |
78
+ | 0.1342 | 3.3464 | 320 | 0.1569 | 2.4367 | -2.3871 | 0.9477 | 4.8239 | -63.9495 | -16.0381 | -0.0302 | -0.0403 |
79
+ | 0.134 | 3.5556 | 340 | 0.1322 | 2.2686 | -2.7697 | 0.9535 | 5.0382 | -67.7748 | -17.7199 | -0.0218 | -0.0326 |
80
+ | 0.0875 | 3.7647 | 360 | 0.1696 | 2.7864 | -1.9444 | 0.9302 | 4.7308 | -59.5222 | -12.5413 | -0.0084 | -0.0191 |
81
+ | 0.0724 | 3.9739 | 380 | 0.1411 | 2.7248 | -2.4834 | 0.9302 | 5.2081 | -64.9118 | -13.1577 | 0.0165 | 0.0051 |
82
+ | 0.0471 | 4.1830 | 400 | 0.1487 | 2.5177 | -3.5385 | 0.9477 | 6.0562 | -75.4631 | -15.2287 | 0.0398 | 0.0267 |
83
+ | 0.053 | 4.3922 | 420 | 0.1444 | 2.3522 | -3.7636 | 0.9535 | 6.1157 | -77.7140 | -16.8839 | 0.0486 | 0.0352 |
84
+ | 0.0503 | 4.6013 | 440 | 0.1726 | 2.9349 | -2.6371 | 0.9302 | 5.5720 | -66.4492 | -11.0570 | 0.0511 | 0.0379 |
85
+ | 0.0503 | 4.8105 | 460 | 0.1848 | 2.4749 | -3.5917 | 0.9477 | 6.0666 | -75.9953 | -15.6569 | 0.0731 | 0.0596 |
86
+ | 0.0636 | 5.0196 | 480 | 0.1494 | 2.2862 | -4.0054 | 0.9593 | 6.2916 | -80.1320 | -17.5431 | 0.0867 | 0.0727 |
87
+ | 0.0299 | 5.2288 | 500 | 0.1578 | 2.4928 | -3.6944 | 0.9593 | 6.1872 | -77.0218 | -15.4775 | 0.0949 | 0.0808 |
88
+ | 0.0224 | 5.4379 | 520 | 0.1532 | 2.4571 | -3.9264 | 0.9535 | 6.3835 | -79.3418 | -15.8344 | 0.1001 | 0.0855 |
89
+ | 0.0674 | 5.6471 | 540 | 0.1652 | 2.5101 | -3.8634 | 0.9477 | 6.3734 | -78.7120 | -15.3049 | 0.1032 | 0.0885 |
90
+ | 0.0259 | 5.8562 | 560 | 0.1630 | 2.4884 | -3.9396 | 0.9477 | 6.4279 | -79.4737 | -15.5216 | 0.1059 | 0.0910 |
91
+
92
+
93
+ ### Framework versions
94
+
95
+ - PEFT 0.11.1
96
+ - Transformers 4.41.2
97
+ - Pytorch 2.3.0+cu121
98
+ - Datasets 2.19.2
99
+ - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6689373c73893f375f8a07862cbe8a8bbc2a74409d3d4febba1d5da85a7c377
3
  size 2684416208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3acf2ab3ac0196cf8b3871a17f2658ed7ac63e95bb2e92cfba8298fdff1fddb8
3
  size 2684416208