jan-hq commited on
Commit
5b9d863
1 Parent(s): 934960d

Model save

Browse files
README.md ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - trl
4
+ - dpo
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: LlamaCorn-1.1B-Chat
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # LlamaCorn-1.1B-Chat
15
+
16
+ This model was trained from scratch on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.9305
19
+ - Rewards/chosen: -0.2148
20
+ - Rewards/rejected: -0.2954
21
+ - Rewards/accuracies: 0.5824
22
+ - Rewards/margins: 0.0806
23
+ - Logps/rejected: -183.8757
24
+ - Logps/chosen: -197.7534
25
+ - Logits/rejected: -2.6439
26
+ - Logits/chosen: -2.6493
27
+
28
+ ## Model description
29
+
30
+ More information needed
31
+
32
+ ## Intended uses & limitations
33
+
34
+ More information needed
35
+
36
+ ## Training and evaluation data
37
+
38
+ More information needed
39
+
40
+ ## Training procedure
41
+
42
+ ### Training hyperparameters
43
+
44
+ The following hyperparameters were used during training:
45
+ - learning_rate: 5e-07
46
+ - train_batch_size: 2
47
+ - eval_batch_size: 4
48
+ - seed: 42
49
+ - distributed_type: multi-GPU
50
+ - num_devices: 2
51
+ - gradient_accumulation_steps: 16
52
+ - total_train_batch_size: 64
53
+ - total_eval_batch_size: 8
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 3
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.9958 | 0.03 | 100 | 1.0003 | -0.0002 | -0.0002 | 0.4930 | -0.0001 | -180.9232 | -195.6078 | -2.6876 | -2.6924 |
64
+ | 0.9984 | 0.06 | 200 | 0.9995 | -0.0007 | -0.0013 | 0.4988 | 0.0006 | -180.9347 | -195.6127 | -2.6787 | -2.6838 |
65
+ | 0.9982 | 0.09 | 300 | 0.9997 | -0.0008 | -0.0015 | 0.4983 | 0.0007 | -180.9361 | -195.6136 | -2.6848 | -2.6897 |
66
+ | 0.9966 | 0.12 | 400 | 0.9999 | -0.0024 | -0.0027 | 0.4995 | 0.0003 | -180.9485 | -195.6291 | -2.6865 | -2.6914 |
67
+ | 0.9992 | 0.15 | 500 | 0.9984 | -0.0039 | -0.0054 | 0.5122 | 0.0015 | -180.9753 | -195.6440 | -2.6641 | -2.6694 |
68
+ | 0.9983 | 0.18 | 600 | 0.9981 | -0.0054 | -0.0073 | 0.5127 | 0.0020 | -180.9945 | -195.6589 | -2.6862 | -2.6911 |
69
+ | 0.9968 | 0.2 | 700 | 0.9972 | -0.0093 | -0.0127 | 0.5241 | 0.0034 | -181.0485 | -195.6985 | -2.6753 | -2.6803 |
70
+ | 0.9893 | 0.23 | 800 | 0.9951 | -0.0114 | -0.0164 | 0.5248 | 0.0051 | -181.0858 | -195.7188 | -2.6676 | -2.6728 |
71
+ | 0.988 | 0.26 | 900 | 0.9924 | -0.0169 | -0.0245 | 0.5421 | 0.0076 | -181.1663 | -195.7744 | -2.6763 | -2.6814 |
72
+ | 0.9879 | 0.29 | 1000 | 0.9907 | -0.0220 | -0.0318 | 0.5476 | 0.0098 | -181.2388 | -195.8248 | -2.6746 | -2.6796 |
73
+ | 0.9882 | 0.32 | 1100 | 0.9869 | -0.0261 | -0.0399 | 0.5598 | 0.0138 | -181.3200 | -195.8661 | -2.6647 | -2.6699 |
74
+ | 0.979 | 0.35 | 1200 | 0.9851 | -0.0364 | -0.0521 | 0.5563 | 0.0157 | -181.4419 | -195.9693 | -2.6684 | -2.6735 |
75
+ | 0.985 | 0.38 | 1300 | 0.9818 | -0.0385 | -0.0576 | 0.5608 | 0.0192 | -181.4978 | -195.9900 | -2.6874 | -2.6921 |
76
+ | 0.9821 | 0.41 | 1400 | 0.9805 | -0.0462 | -0.0668 | 0.5590 | 0.0206 | -181.5891 | -196.0672 | -2.6761 | -2.6810 |
77
+ | 0.9822 | 0.44 | 1500 | 0.9779 | -0.0550 | -0.0777 | 0.5632 | 0.0227 | -181.6983 | -196.1554 | -2.6764 | -2.6813 |
78
+ | 0.9755 | 0.47 | 1600 | 0.9756 | -0.0600 | -0.0855 | 0.5656 | 0.0255 | -181.7764 | -196.2058 | -2.6502 | -2.6557 |
79
+ | 0.9697 | 0.5 | 1700 | 0.9731 | -0.0652 | -0.0931 | 0.5651 | 0.0280 | -181.8526 | -196.2569 | -2.6752 | -2.6801 |
80
+ | 0.969 | 0.53 | 1800 | 0.9698 | -0.0701 | -0.1017 | 0.5687 | 0.0315 | -181.9380 | -196.3067 | -2.6635 | -2.6687 |
81
+ | 0.9643 | 0.55 | 1900 | 0.9685 | -0.0762 | -0.1092 | 0.5676 | 0.0331 | -182.0137 | -196.3669 | -2.6590 | -2.6642 |
82
+ | 0.9655 | 0.58 | 2000 | 0.9663 | -0.0821 | -0.1180 | 0.5756 | 0.0359 | -182.1012 | -196.4265 | -2.6802 | -2.6850 |
83
+ | 0.9719 | 0.61 | 2100 | 0.9645 | -0.0908 | -0.1281 | 0.5676 | 0.0373 | -182.2023 | -196.5133 | -2.6677 | -2.6727 |
84
+ | 0.9576 | 0.64 | 2200 | 0.9625 | -0.0953 | -0.1350 | 0.5729 | 0.0396 | -182.2709 | -196.5585 | -2.6679 | -2.6730 |
85
+ | 0.9619 | 0.67 | 2300 | 0.9603 | -0.1012 | -0.1436 | 0.5783 | 0.0424 | -182.3572 | -196.6170 | -2.6527 | -2.6580 |
86
+ | 0.9511 | 0.7 | 2400 | 0.9601 | -0.1105 | -0.1540 | 0.5722 | 0.0434 | -182.4612 | -196.7107 | -2.6565 | -2.6617 |
87
+ | 0.9516 | 0.73 | 2500 | 0.9570 | -0.1158 | -0.1618 | 0.5715 | 0.0460 | -182.5389 | -196.7630 | -2.6613 | -2.6664 |
88
+ | 0.9577 | 0.76 | 2600 | 0.9554 | -0.1236 | -0.1717 | 0.5719 | 0.0481 | -182.6387 | -196.8413 | -2.6595 | -2.6646 |
89
+ | 0.9471 | 0.79 | 2700 | 0.9541 | -0.1268 | -0.1763 | 0.5736 | 0.0495 | -182.6840 | -196.8731 | -2.6621 | -2.6672 |
90
+ | 0.9519 | 0.82 | 2800 | 0.9524 | -0.1336 | -0.1849 | 0.5738 | 0.0513 | -182.7705 | -196.9414 | -2.6762 | -2.6810 |
91
+ | 0.9522 | 0.85 | 2900 | 0.9515 | -0.1364 | -0.1896 | 0.5724 | 0.0531 | -182.8170 | -196.9696 | -2.6604 | -2.6655 |
92
+ | 0.9414 | 0.88 | 3000 | 0.9491 | -0.1395 | -0.1949 | 0.5744 | 0.0555 | -182.8706 | -197.0000 | -2.6706 | -2.6755 |
93
+ | 0.9509 | 0.9 | 3100 | 0.9483 | -0.1450 | -0.2020 | 0.5799 | 0.0570 | -182.9411 | -197.0551 | -2.6574 | -2.6625 |
94
+ | 0.9453 | 0.93 | 3200 | 0.9472 | -0.1472 | -0.2061 | 0.5834 | 0.0589 | -182.9822 | -197.0772 | -2.6424 | -2.6478 |
95
+ | 0.9577 | 0.96 | 3300 | 0.9461 | -0.1490 | -0.2081 | 0.5794 | 0.0590 | -183.0018 | -197.0956 | -2.6570 | -2.6622 |
96
+ | 0.9374 | 0.99 | 3400 | 0.9452 | -0.1532 | -0.2145 | 0.5770 | 0.0613 | -183.0663 | -197.1376 | -2.6499 | -2.6552 |
97
+ | 0.9299 | 1.02 | 3500 | 0.9439 | -0.1570 | -0.2195 | 0.5770 | 0.0625 | -183.1160 | -197.1755 | -2.6612 | -2.6663 |
98
+ | 0.936 | 1.05 | 3600 | 0.9438 | -0.1628 | -0.2265 | 0.5789 | 0.0637 | -183.1864 | -197.2330 | -2.6532 | -2.6584 |
99
+ | 0.9435 | 1.08 | 3700 | 0.9420 | -0.1655 | -0.2305 | 0.5807 | 0.0650 | -183.2263 | -197.2607 | -2.6673 | -2.6723 |
100
+ | 0.9341 | 1.11 | 3800 | 0.9422 | -0.1698 | -0.2351 | 0.5812 | 0.0653 | -183.2721 | -197.3029 | -2.6585 | -2.6636 |
101
+ | 0.9296 | 1.14 | 3900 | 0.9405 | -0.1736 | -0.2401 | 0.5714 | 0.0665 | -183.3225 | -197.3411 | -2.6382 | -2.6437 |
102
+ | 0.9338 | 1.17 | 4000 | 0.9402 | -0.1747 | -0.2426 | 0.5772 | 0.0680 | -183.3476 | -197.3519 | -2.6428 | -2.6483 |
103
+ | 0.9257 | 1.2 | 4100 | 0.9395 | -0.1780 | -0.2462 | 0.5766 | 0.0682 | -183.3829 | -197.3849 | -2.6411 | -2.6465 |
104
+ | 0.9368 | 1.23 | 4200 | 0.9386 | -0.1786 | -0.2485 | 0.5833 | 0.0699 | -183.4063 | -197.3914 | -2.6495 | -2.6548 |
105
+ | 0.916 | 1.25 | 4300 | 0.9385 | -0.1812 | -0.2513 | 0.5763 | 0.0702 | -183.4345 | -197.4169 | -2.6390 | -2.6445 |
106
+ | 0.9093 | 1.28 | 4400 | 0.9375 | -0.1864 | -0.2576 | 0.5831 | 0.0712 | -183.4972 | -197.4688 | -2.6448 | -2.6502 |
107
+ | 0.9408 | 1.31 | 4500 | 0.9368 | -0.1896 | -0.2615 | 0.5797 | 0.0719 | -183.5364 | -197.5016 | -2.6422 | -2.6476 |
108
+ | 0.9245 | 1.34 | 4600 | 0.9363 | -0.1926 | -0.2660 | 0.5787 | 0.0734 | -183.5815 | -197.5314 | -2.6563 | -2.6614 |
109
+ | 0.9469 | 1.37 | 4700 | 0.9364 | -0.1944 | -0.2666 | 0.5775 | 0.0722 | -183.5875 | -197.5493 | -2.6581 | -2.6632 |
110
+ | 0.9421 | 1.4 | 4800 | 0.9358 | -0.1946 | -0.2683 | 0.5819 | 0.0736 | -183.6040 | -197.5517 | -2.6640 | -2.6691 |
111
+ | 0.9076 | 1.43 | 4900 | 0.9356 | -0.1963 | -0.2704 | 0.5799 | 0.0741 | -183.6253 | -197.5680 | -2.6626 | -2.6676 |
112
+ | 0.94 | 1.46 | 5000 | 0.9353 | -0.1996 | -0.2738 | 0.5800 | 0.0742 | -183.6591 | -197.6010 | -2.6438 | -2.6492 |
113
+ | 0.9288 | 1.49 | 5100 | 0.9351 | -0.1999 | -0.2741 | 0.5809 | 0.0742 | -183.6625 | -197.6045 | -2.6433 | -2.6487 |
114
+ | 0.927 | 1.52 | 5200 | 0.9343 | -0.2009 | -0.2767 | 0.5821 | 0.0758 | -183.6883 | -197.6144 | -2.6499 | -2.6552 |
115
+ | 0.9171 | 1.55 | 5300 | 0.9339 | -0.2024 | -0.2784 | 0.5823 | 0.0760 | -183.7055 | -197.6292 | -2.6421 | -2.6476 |
116
+ | 0.9337 | 1.58 | 5400 | 0.9344 | -0.2040 | -0.2799 | 0.5787 | 0.0760 | -183.7208 | -197.6453 | -2.6459 | -2.6513 |
117
+ | 0.919 | 1.6 | 5500 | 0.9334 | -0.2058 | -0.2825 | 0.5811 | 0.0767 | -183.7465 | -197.6637 | -2.6390 | -2.6445 |
118
+ | 0.9297 | 1.63 | 5600 | 0.9341 | -0.2053 | -0.2822 | 0.5794 | 0.0770 | -183.7437 | -197.6582 | -2.6418 | -2.6472 |
119
+ | 0.9174 | 1.66 | 5700 | 0.9333 | -0.2067 | -0.2834 | 0.5800 | 0.0767 | -183.7554 | -197.6726 | -2.6492 | -2.6545 |
120
+ | 0.9275 | 1.69 | 5800 | 0.9332 | -0.2059 | -0.2826 | 0.5760 | 0.0767 | -183.7476 | -197.6642 | -2.6471 | -2.6524 |
121
+ | 0.9164 | 1.72 | 5900 | 0.9321 | -0.2079 | -0.2867 | 0.5809 | 0.0787 | -183.7881 | -197.6847 | -2.6387 | -2.6442 |
122
+ | 0.9218 | 1.75 | 6000 | 0.9322 | -0.2095 | -0.2872 | 0.5787 | 0.0777 | -183.7935 | -197.7004 | -2.6377 | -2.6432 |
123
+ | 0.944 | 1.78 | 6100 | 0.9319 | -0.2106 | -0.2895 | 0.5823 | 0.0789 | -183.8163 | -197.7118 | -2.6555 | -2.6607 |
124
+ | 0.9037 | 1.81 | 6200 | 0.9323 | -0.2105 | -0.2892 | 0.5780 | 0.0787 | -183.8135 | -197.7102 | -2.6459 | -2.6513 |
125
+ | 0.929 | 1.84 | 6300 | 0.9321 | -0.2114 | -0.2905 | 0.5773 | 0.0791 | -183.8265 | -197.7195 | -2.6446 | -2.6500 |
126
+ | 0.9091 | 1.87 | 6400 | 0.9324 | -0.2111 | -0.2904 | 0.5760 | 0.0793 | -183.8252 | -197.7167 | -2.6534 | -2.6586 |
127
+ | 0.9094 | 1.9 | 6500 | 0.9321 | -0.2123 | -0.2903 | 0.5770 | 0.0780 | -183.8242 | -197.7287 | -2.6477 | -2.6530 |
128
+ | 0.9449 | 1.93 | 6600 | 0.9320 | -0.2120 | -0.2903 | 0.5795 | 0.0784 | -183.8246 | -197.7251 | -2.6302 | -2.6358 |
129
+ | 0.9404 | 1.95 | 6700 | 0.9319 | -0.2115 | -0.2909 | 0.5802 | 0.0794 | -183.8302 | -197.7204 | -2.6443 | -2.6497 |
130
+ | 0.9155 | 1.98 | 6800 | 0.9314 | -0.2124 | -0.2919 | 0.5826 | 0.0795 | -183.8406 | -197.7291 | -2.6306 | -2.6362 |
131
+ | 0.9328 | 2.01 | 6900 | 0.9313 | -0.2127 | -0.2924 | 0.5884 | 0.0798 | -183.8456 | -197.7321 | -2.6296 | -2.6352 |
132
+ | 0.9012 | 2.04 | 7000 | 0.9321 | -0.2146 | -0.2932 | 0.5766 | 0.0785 | -183.8530 | -197.7515 | -2.6361 | -2.6416 |
133
+ | 0.9296 | 2.07 | 7100 | 0.9314 | -0.2127 | -0.2929 | 0.5780 | 0.0802 | -183.8507 | -197.7323 | -2.6460 | -2.6513 |
134
+ | 0.9076 | 2.1 | 7200 | 0.9315 | -0.2145 | -0.2945 | 0.5797 | 0.0799 | -183.8660 | -197.7507 | -2.6501 | -2.6554 |
135
+ | 0.922 | 2.13 | 7300 | 0.9315 | -0.2147 | -0.2935 | 0.5792 | 0.0788 | -183.8565 | -197.7523 | -2.6510 | -2.6562 |
136
+ | 0.9136 | 2.16 | 7400 | 0.9313 | -0.2146 | -0.2941 | 0.5819 | 0.0795 | -183.8625 | -197.7515 | -2.6410 | -2.6464 |
137
+ | 0.9401 | 2.19 | 7500 | 0.9314 | -0.2140 | -0.2937 | 0.5799 | 0.0797 | -183.8583 | -197.7451 | -2.6490 | -2.6543 |
138
+ | 0.9295 | 2.22 | 7600 | 0.9313 | -0.2153 | -0.2953 | 0.5812 | 0.0800 | -183.8747 | -197.7585 | -2.6569 | -2.6620 |
139
+ | 0.9128 | 2.25 | 7700 | 0.9309 | -0.2154 | -0.2960 | 0.5817 | 0.0806 | -183.8814 | -197.7590 | -2.6500 | -2.6553 |
140
+ | 0.9074 | 2.28 | 7800 | 0.9312 | -0.2159 | -0.2964 | 0.5836 | 0.0804 | -183.8851 | -197.7648 | -2.6505 | -2.6557 |
141
+ | 0.9114 | 2.3 | 7900 | 0.9310 | -0.2149 | -0.2949 | 0.5836 | 0.0800 | -183.8703 | -197.7544 | -2.6425 | -2.6479 |
142
+ | 0.9181 | 2.33 | 8000 | 0.9318 | -0.2145 | -0.2937 | 0.5772 | 0.0792 | -183.8585 | -197.7501 | -2.6611 | -2.6661 |
143
+ | 0.9009 | 2.36 | 8100 | 0.9311 | -0.2149 | -0.2952 | 0.5799 | 0.0803 | -183.8736 | -197.7543 | -2.6581 | -2.6632 |
144
+ | 0.9091 | 2.39 | 8200 | 0.9311 | -0.2165 | -0.2960 | 0.5829 | 0.0795 | -183.8816 | -197.7702 | -2.6378 | -2.6433 |
145
+ | 0.9091 | 2.42 | 8300 | 0.9312 | -0.2146 | -0.2950 | 0.5833 | 0.0805 | -183.8717 | -197.7510 | -2.6475 | -2.6528 |
146
+ | 0.9419 | 2.45 | 8400 | 0.9307 | -0.2138 | -0.2946 | 0.5777 | 0.0808 | -183.8678 | -197.7433 | -2.6364 | -2.6419 |
147
+ | 0.9203 | 2.48 | 8500 | 0.9313 | -0.2148 | -0.2948 | 0.5834 | 0.0800 | -183.8688 | -197.7529 | -2.6474 | -2.6527 |
148
+ | 0.9102 | 2.51 | 8600 | 0.9315 | -0.2158 | -0.2958 | 0.5821 | 0.0800 | -183.8791 | -197.7635 | -2.6436 | -2.6489 |
149
+ | 0.9327 | 2.54 | 8700 | 0.9316 | -0.2146 | -0.2946 | 0.5824 | 0.0800 | -183.8669 | -197.7511 | -2.6505 | -2.6558 |
150
+ | 0.9221 | 2.57 | 8800 | 0.9305 | -0.2149 | -0.2953 | 0.5828 | 0.0804 | -183.8742 | -197.7540 | -2.6659 | -2.6709 |
151
+ | 0.8851 | 2.6 | 8900 | 0.9315 | -0.2146 | -0.2949 | 0.5816 | 0.0803 | -183.8702 | -197.7508 | -2.6571 | -2.6622 |
152
+ | 0.924 | 2.63 | 9000 | 0.9304 | -0.2144 | -0.2951 | 0.5804 | 0.0807 | -183.8718 | -197.7492 | -2.6449 | -2.6503 |
153
+ | 0.9025 | 2.65 | 9100 | 0.9315 | -0.2150 | -0.2950 | 0.5790 | 0.0800 | -183.8715 | -197.7551 | -2.6410 | -2.6464 |
154
+ | 0.9348 | 2.68 | 9200 | 0.9308 | -0.2144 | -0.2946 | 0.5802 | 0.0802 | -183.8669 | -197.7491 | -2.6349 | -2.6405 |
155
+ | 0.9067 | 2.71 | 9300 | 0.9312 | -0.2155 | -0.2959 | 0.5857 | 0.0805 | -183.8805 | -197.7599 | -2.6410 | -2.6465 |
156
+ | 0.9263 | 2.74 | 9400 | 0.9307 | -0.2148 | -0.2957 | 0.5829 | 0.0809 | -183.8785 | -197.7536 | -2.6432 | -2.6486 |
157
+ | 0.912 | 2.77 | 9500 | 0.9306 | -0.2153 | -0.2957 | 0.5823 | 0.0805 | -183.8788 | -197.7581 | -2.6441 | -2.6495 |
158
+ | 0.9157 | 2.8 | 9600 | 0.9314 | -0.2169 | -0.2965 | 0.5785 | 0.0795 | -183.8859 | -197.7745 | -2.6439 | -2.6493 |
159
+ | 0.9094 | 2.83 | 9700 | 0.9309 | -0.2157 | -0.2961 | 0.5831 | 0.0804 | -183.8826 | -197.7625 | -2.6441 | -2.6494 |
160
+ | 0.9256 | 2.86 | 9800 | 0.9304 | -0.2160 | -0.2965 | 0.5838 | 0.0805 | -183.8867 | -197.7653 | -2.6439 | -2.6493 |
161
+ | 0.9287 | 2.89 | 9900 | 0.9305 | -0.2149 | -0.2955 | 0.5833 | 0.0806 | -183.8762 | -197.7545 | -2.6440 | -2.6494 |
162
+ | 0.9296 | 2.92 | 10000 | 0.9310 | -0.2157 | -0.2953 | 0.5795 | 0.0796 | -183.8741 | -197.7621 | -2.6439 | -2.6493 |
163
+ | 0.9335 | 2.95 | 10100 | 0.9311 | -0.2153 | -0.2953 | 0.5812 | 0.0800 | -183.8739 | -197.7578 | -2.6439 | -2.6493 |
164
+ | 0.9321 | 2.98 | 10200 | 0.9305 | -0.2149 | -0.2955 | 0.5824 | 0.0805 | -183.8759 | -197.7545 | -2.6439 | -2.6493 |
165
+
166
+
167
+ ### Framework versions
168
+
169
+ - Transformers 4.36.2
170
+ - Pytorch 2.1.2+cu121
171
+ - Datasets 2.14.6
172
+ - Tokenizers 0.15.0
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_logits/chosen": -2.649322271347046,
4
+ "eval_logits/rejected": -2.6439428329467773,
5
+ "eval_logps/chosen": -197.75340270996094,
6
+ "eval_logps/rejected": -183.8756866455078,
7
+ "eval_loss": 0.9304828643798828,
8
+ "eval_rewards/accuracies": 0.582426905632019,
9
+ "eval_rewards/chosen": -0.21480964124202728,
10
+ "eval_rewards/margins": 0.08063079416751862,
11
+ "eval_rewards/rejected": -0.2954404354095459,
12
+ "eval_runtime": 443.5148,
13
+ "eval_samples": 11765,
14
+ "eval_samples_per_second": 26.527,
15
+ "eval_steps_per_second": 3.317,
16
+ "train_loss": 0.9410081987063832,
17
+ "train_runtime": 92137.4772,
18
+ "train_samples": 219411,
19
+ "train_samples_per_second": 7.144,
20
+ "train_steps_per_second": 0.112
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_logits/chosen": -2.649322271347046,
4
+ "eval_logits/rejected": -2.6439428329467773,
5
+ "eval_logps/chosen": -197.75340270996094,
6
+ "eval_logps/rejected": -183.8756866455078,
7
+ "eval_loss": 0.9304828643798828,
8
+ "eval_rewards/accuracies": 0.582426905632019,
9
+ "eval_rewards/chosen": -0.21480964124202728,
10
+ "eval_rewards/margins": 0.08063079416751862,
11
+ "eval_rewards/rejected": -0.2954404354095459,
12
+ "eval_runtime": 443.5148,
13
+ "eval_samples": 11765,
14
+ "eval_samples_per_second": 26.527,
15
+ "eval_steps_per_second": 3.317
16
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.36.2"
7
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d85663155cac68ed8e18e6c04eb205207fc23800e0c4b26ffb8d9611d8726f04
3
  size 2200119864
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd8073ac88aca635b61d4df026ff649954e2f7f53c15578e1d44605ffcfe7f19
3
  size 2200119864
runs/Jan24_05-52-24_333df911e7ea/events.out.tfevents.1706075592.333df911e7ea.1937124.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d85d7b906c6d9fb46a3a934da9438ae62eb8d6505a6a6691b8cf3b6220240aa
3
- size 720170
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56aff1db86138dd14e9db24410f0657c9d23ae2fe250223c266b08fc63e47359
3
+ size 732676
runs/Jan24_05-52-24_333df911e7ea/events.out.tfevents.1706168173.333df911e7ea.1937124.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc75092aaddc85feed8885d3175312119aa5f7c304994bd76dcfb9a7b48f8d93
3
+ size 828
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "train_loss": 0.9410081987063832,
4
+ "train_runtime": 92137.4772,
5
+ "train_samples": 219411,
6
+ "train_samples_per_second": 7.144,
7
+ "train_steps_per_second": 0.112
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff