martimfasantos commited on
Commit
eb7a40a
1 Parent(s): 26e89b6

Model save

Browse files
README.md ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: tinyllama-1.1b-sum-dpo-full_LR5e-8_2epochs_old
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # tinyllama-1.1b-sum-dpo-full_LR5e-8_2epochs_old
17
+
18
+ This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.6808
21
+ - Rewards/chosen: -0.1214
22
+ - Rewards/rejected: -0.1497
23
+ - Rewards/accuracies: 0.6090
24
+ - Rewards/margins: 0.0284
25
+ - Logps/rejected: -78.1532
26
+ - Logps/chosen: -70.8499
27
+ - Logits/rejected: -2.9566
28
+ - Logits/chosen: -2.9624
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-08
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 42
51
+ - distributed_type: multi-GPU
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 16
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_ratio: 0.1
57
+ - num_epochs: 2
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6931 | 0.0172 | 100 | 0.6932 | 0.0001 | 0.0001 | 0.4830 | -0.0000 | -63.1707 | -58.7060 | -3.1577 | -3.1634 |
64
+ | 0.6931 | 0.0345 | 200 | 0.6932 | 0.0000 | 0.0001 | 0.4763 | -0.0001 | -63.1661 | -58.7098 | -3.1576 | -3.1633 |
65
+ | 0.6931 | 0.0517 | 300 | 0.6932 | -0.0000 | 0.0000 | 0.4893 | -0.0001 | -63.1759 | -58.7129 | -3.1578 | -3.1635 |
66
+ | 0.6932 | 0.0689 | 400 | 0.6932 | 0.0001 | 0.0003 | 0.4631 | -0.0001 | -63.1539 | -58.6981 | -3.1577 | -3.1634 |
67
+ | 0.6931 | 0.0861 | 500 | 0.6932 | 0.0001 | 0.0002 | 0.4842 | -0.0001 | -63.1628 | -58.7064 | -3.1577 | -3.1633 |
68
+ | 0.6929 | 0.1034 | 600 | 0.6932 | 0.0001 | 0.0002 | 0.4870 | -0.0000 | -63.1628 | -58.6974 | -3.1574 | -3.1630 |
69
+ | 0.693 | 0.1206 | 700 | 0.6932 | 0.0002 | 0.0002 | 0.4865 | -0.0000 | -63.1602 | -58.6945 | -3.1573 | -3.1629 |
70
+ | 0.6928 | 0.1378 | 800 | 0.6931 | 0.0003 | 0.0003 | 0.5005 | 0.0000 | -63.1503 | -58.6786 | -3.1570 | -3.1626 |
71
+ | 0.6929 | 0.1551 | 900 | 0.6931 | 0.0006 | 0.0004 | 0.5114 | 0.0002 | -63.1377 | -58.6515 | -3.1564 | -3.1620 |
72
+ | 0.6929 | 0.1723 | 1000 | 0.6930 | 0.0007 | 0.0004 | 0.5163 | 0.0002 | -63.1368 | -58.6461 | -3.1554 | -3.1611 |
73
+ | 0.6927 | 0.1895 | 1100 | 0.6930 | 0.0008 | 0.0005 | 0.5353 | 0.0003 | -63.1281 | -58.6300 | -3.1546 | -3.1602 |
74
+ | 0.6926 | 0.2068 | 1200 | 0.6929 | 0.0011 | 0.0007 | 0.5332 | 0.0004 | -63.1063 | -58.5972 | -3.1533 | -3.1590 |
75
+ | 0.6925 | 0.2240 | 1300 | 0.6928 | 0.0014 | 0.0008 | 0.5551 | 0.0006 | -63.0993 | -58.5706 | -3.1521 | -3.1577 |
76
+ | 0.6911 | 0.2412 | 1400 | 0.6927 | 0.0016 | 0.0006 | 0.5537 | 0.0010 | -63.1157 | -58.5519 | -3.1503 | -3.1559 |
77
+ | 0.6906 | 0.2584 | 1500 | 0.6925 | 0.0018 | 0.0006 | 0.5644 | 0.0013 | -63.1246 | -58.5291 | -3.1489 | -3.1545 |
78
+ | 0.6915 | 0.2757 | 1600 | 0.6924 | 0.0019 | 0.0005 | 0.5660 | 0.0015 | -63.1345 | -58.5184 | -3.1472 | -3.1529 |
79
+ | 0.6912 | 0.2929 | 1700 | 0.6922 | 0.0021 | 0.0002 | 0.5634 | 0.0019 | -63.1578 | -58.5044 | -3.1446 | -3.1502 |
80
+ | 0.6889 | 0.3101 | 1800 | 0.6922 | 0.0019 | -0.0001 | 0.5653 | 0.0020 | -63.1906 | -58.5175 | -3.1424 | -3.1481 |
81
+ | 0.69 | 0.3274 | 1900 | 0.6919 | 0.0019 | -0.0006 | 0.5771 | 0.0025 | -63.2406 | -58.5210 | -3.1407 | -3.1464 |
82
+ | 0.6899 | 0.3446 | 2000 | 0.6919 | 0.0016 | -0.0011 | 0.5771 | 0.0027 | -63.2913 | -58.5564 | -3.1376 | -3.1433 |
83
+ | 0.6892 | 0.3618 | 2100 | 0.6917 | 0.0012 | -0.0017 | 0.5741 | 0.0030 | -63.3523 | -58.5873 | -3.1355 | -3.1412 |
84
+ | 0.6866 | 0.3790 | 2200 | 0.6916 | 0.0008 | -0.0025 | 0.5743 | 0.0033 | -63.4306 | -58.6304 | -3.1324 | -3.1381 |
85
+ | 0.6859 | 0.3963 | 2300 | 0.6914 | 0.0003 | -0.0035 | 0.5683 | 0.0037 | -63.5263 | -58.6859 | -3.1305 | -3.1361 |
86
+ | 0.6889 | 0.4135 | 2400 | 0.6912 | -0.0006 | -0.0047 | 0.5781 | 0.0041 | -63.6550 | -58.7736 | -3.1267 | -3.1324 |
87
+ | 0.6902 | 0.4307 | 2500 | 0.6910 | -0.0014 | -0.0060 | 0.5781 | 0.0045 | -63.7757 | -58.8557 | -3.1236 | -3.1293 |
88
+ | 0.685 | 0.4480 | 2600 | 0.6908 | -0.0029 | -0.0078 | 0.5825 | 0.0049 | -63.9588 | -58.9977 | -3.1216 | -3.1272 |
89
+ | 0.6852 | 0.4652 | 2700 | 0.6906 | -0.0048 | -0.0102 | 0.5834 | 0.0054 | -64.2020 | -59.1921 | -3.1189 | -3.1246 |
90
+ | 0.6857 | 0.4824 | 2800 | 0.6904 | -0.0062 | -0.0120 | 0.5860 | 0.0058 | -64.3761 | -59.3318 | -3.1154 | -3.1211 |
91
+ | 0.688 | 0.4997 | 2900 | 0.6902 | -0.0087 | -0.0149 | 0.5862 | 0.0062 | -64.6728 | -59.5807 | -3.1119 | -3.1176 |
92
+ | 0.6877 | 0.5169 | 3000 | 0.6901 | -0.0114 | -0.0180 | 0.5795 | 0.0066 | -64.9774 | -59.8506 | -3.1089 | -3.1146 |
93
+ | 0.6846 | 0.5341 | 3100 | 0.6899 | -0.0123 | -0.0192 | 0.5822 | 0.0070 | -65.1015 | -59.9371 | -3.1072 | -3.1128 |
94
+ | 0.6856 | 0.5513 | 3200 | 0.6897 | -0.0154 | -0.0230 | 0.5822 | 0.0075 | -65.4752 | -60.2526 | -3.1035 | -3.1092 |
95
+ | 0.6825 | 0.5686 | 3300 | 0.6894 | -0.0185 | -0.0266 | 0.5860 | 0.0081 | -65.8370 | -60.5571 | -3.0987 | -3.1044 |
96
+ | 0.6782 | 0.5858 | 3400 | 0.6891 | -0.0209 | -0.0296 | 0.5892 | 0.0087 | -66.1367 | -60.7975 | -3.0949 | -3.1006 |
97
+ | 0.6844 | 0.6030 | 3500 | 0.6890 | -0.0230 | -0.0321 | 0.5904 | 0.0091 | -66.3928 | -61.0109 | -3.0922 | -3.0980 |
98
+ | 0.6825 | 0.6203 | 3600 | 0.6887 | -0.0251 | -0.0347 | 0.5934 | 0.0097 | -66.6546 | -61.2199 | -3.0886 | -3.0944 |
99
+ | 0.6782 | 0.6375 | 3700 | 0.6885 | -0.0273 | -0.0374 | 0.5920 | 0.0101 | -66.9203 | -61.4445 | -3.0848 | -3.0906 |
100
+ | 0.6814 | 0.6547 | 3800 | 0.6882 | -0.0304 | -0.0412 | 0.5915 | 0.0107 | -67.2956 | -61.7525 | -3.0816 | -3.0874 |
101
+ | 0.6784 | 0.6720 | 3900 | 0.6880 | -0.0335 | -0.0449 | 0.5936 | 0.0114 | -67.6722 | -62.0628 | -3.0784 | -3.0841 |
102
+ | 0.6811 | 0.6892 | 4000 | 0.6877 | -0.0370 | -0.0491 | 0.5950 | 0.0121 | -68.0929 | -62.4165 | -3.0748 | -3.0805 |
103
+ | 0.6741 | 0.7064 | 4100 | 0.6875 | -0.0379 | -0.0503 | 0.5922 | 0.0124 | -68.2125 | -62.4995 | -3.0698 | -3.0755 |
104
+ | 0.6837 | 0.7236 | 4200 | 0.6874 | -0.0399 | -0.0526 | 0.5953 | 0.0127 | -68.4362 | -62.6979 | -3.0663 | -3.0720 |
105
+ | 0.6825 | 0.7409 | 4300 | 0.6871 | -0.0407 | -0.0540 | 0.5960 | 0.0133 | -68.5772 | -62.7839 | -3.0631 | -3.0689 |
106
+ | 0.681 | 0.7581 | 4400 | 0.6871 | -0.0428 | -0.0562 | 0.5939 | 0.0134 | -68.7993 | -62.9920 | -3.0603 | -3.0660 |
107
+ | 0.6826 | 0.7753 | 4500 | 0.6868 | -0.0463 | -0.0604 | 0.5932 | 0.0141 | -69.2207 | -63.3446 | -3.0565 | -3.0623 |
108
+ | 0.6744 | 0.7926 | 4600 | 0.6865 | -0.0489 | -0.0635 | 0.5943 | 0.0146 | -69.5328 | -63.5999 | -3.0541 | -3.0598 |
109
+ | 0.6826 | 0.8098 | 4700 | 0.6863 | -0.0524 | -0.0677 | 0.5990 | 0.0153 | -69.9523 | -63.9563 | -3.0511 | -3.0569 |
110
+ | 0.6821 | 0.8270 | 4800 | 0.6861 | -0.0559 | -0.0716 | 0.5934 | 0.0157 | -70.3441 | -64.3050 | -3.0487 | -3.0544 |
111
+ | 0.677 | 0.8442 | 4900 | 0.6858 | -0.0593 | -0.0757 | 0.5922 | 0.0164 | -70.7547 | -64.6435 | -3.0456 | -3.0514 |
112
+ | 0.6765 | 0.8615 | 5000 | 0.6857 | -0.0607 | -0.0774 | 0.5934 | 0.0167 | -70.9189 | -64.7823 | -3.0424 | -3.0482 |
113
+ | 0.6792 | 0.8787 | 5100 | 0.6854 | -0.0643 | -0.0817 | 0.5908 | 0.0174 | -71.3476 | -65.1395 | -3.0393 | -3.0451 |
114
+ | 0.6752 | 0.8959 | 5200 | 0.6852 | -0.0667 | -0.0845 | 0.5957 | 0.0177 | -71.6288 | -65.3858 | -3.0369 | -3.0428 |
115
+ | 0.6752 | 0.9132 | 5300 | 0.6851 | -0.0695 | -0.0876 | 0.5911 | 0.0181 | -71.9352 | -65.6583 | -3.0333 | -3.0390 |
116
+ | 0.6766 | 0.9304 | 5400 | 0.6848 | -0.0707 | -0.0893 | 0.5974 | 0.0186 | -72.1090 | -65.7783 | -3.0313 | -3.0370 |
117
+ | 0.6761 | 0.9476 | 5500 | 0.6848 | -0.0718 | -0.0904 | 0.5969 | 0.0187 | -72.2232 | -65.8871 | -3.0286 | -3.0344 |
118
+ | 0.68 | 0.9649 | 5600 | 0.6847 | -0.0716 | -0.0904 | 0.5992 | 0.0189 | -72.2249 | -65.8690 | -3.0267 | -3.0324 |
119
+ | 0.6744 | 0.9821 | 5700 | 0.6846 | -0.0735 | -0.0928 | 0.5983 | 0.0193 | -72.4612 | -66.0631 | -3.0237 | -3.0295 |
120
+ | 0.6709 | 0.9993 | 5800 | 0.6843 | -0.0764 | -0.0963 | 0.5999 | 0.0199 | -72.8088 | -66.3480 | -3.0203 | -3.0261 |
121
+ | 0.6738 | 1.0165 | 5900 | 0.6842 | -0.0770 | -0.0972 | 0.6018 | 0.0202 | -72.8978 | -66.4100 | -3.0168 | -3.0226 |
122
+ | 0.6755 | 1.0338 | 6000 | 0.6841 | -0.0774 | -0.0977 | 0.6050 | 0.0202 | -72.9485 | -66.4556 | -3.0150 | -3.0207 |
123
+ | 0.6727 | 1.0510 | 6100 | 0.6840 | -0.0790 | -0.0997 | 0.6043 | 0.0207 | -73.1473 | -66.6101 | -3.0124 | -3.0182 |
124
+ | 0.677 | 1.0682 | 6200 | 0.6838 | -0.0804 | -0.1014 | 0.6053 | 0.0210 | -73.3202 | -66.7547 | -3.0100 | -3.0157 |
125
+ | 0.6778 | 1.0855 | 6300 | 0.6838 | -0.0826 | -0.1037 | 0.6018 | 0.0211 | -73.5472 | -66.9698 | -3.0081 | -3.0139 |
126
+ | 0.6772 | 1.1027 | 6400 | 0.6835 | -0.0842 | -0.1060 | 0.6043 | 0.0218 | -73.7832 | -67.1349 | -3.0059 | -3.0117 |
127
+ | 0.6789 | 1.1199 | 6500 | 0.6834 | -0.0856 | -0.1077 | 0.6055 | 0.0221 | -73.9500 | -67.2763 | -3.0033 | -3.0090 |
128
+ | 0.6776 | 1.1371 | 6600 | 0.6833 | -0.0879 | -0.1102 | 0.6036 | 0.0223 | -74.2005 | -67.5068 | -3.0010 | -3.0068 |
129
+ | 0.6755 | 1.1544 | 6700 | 0.6831 | -0.0900 | -0.1127 | 0.6057 | 0.0227 | -74.4476 | -67.7115 | -2.9988 | -3.0045 |
130
+ | 0.6688 | 1.1716 | 6800 | 0.6829 | -0.0926 | -0.1159 | 0.6090 | 0.0233 | -74.7660 | -67.9706 | -2.9960 | -3.0017 |
131
+ | 0.6807 | 1.1888 | 6900 | 0.6828 | -0.0942 | -0.1176 | 0.6062 | 0.0234 | -74.9441 | -68.1345 | -2.9941 | -2.9999 |
132
+ | 0.6691 | 1.2061 | 7000 | 0.6827 | -0.0965 | -0.1202 | 0.6071 | 0.0238 | -75.2016 | -68.3571 | -2.9919 | -2.9977 |
133
+ | 0.6704 | 1.2233 | 7100 | 0.6827 | -0.0970 | -0.1208 | 0.6029 | 0.0238 | -75.2590 | -68.4095 | -2.9898 | -2.9956 |
134
+ | 0.6693 | 1.2405 | 7200 | 0.6825 | -0.0985 | -0.1226 | 0.6073 | 0.0242 | -75.4421 | -68.5575 | -2.9875 | -2.9932 |
135
+ | 0.6811 | 1.2578 | 7300 | 0.6825 | -0.0996 | -0.1238 | 0.6046 | 0.0243 | -75.5637 | -68.6693 | -2.9856 | -2.9914 |
136
+ | 0.6731 | 1.2750 | 7400 | 0.6823 | -0.1008 | -0.1253 | 0.6059 | 0.0245 | -75.7101 | -68.7873 | -2.9843 | -2.9901 |
137
+ | 0.6746 | 1.2922 | 7500 | 0.6823 | -0.1009 | -0.1257 | 0.6036 | 0.0247 | -75.7457 | -68.8045 | -2.9825 | -2.9883 |
138
+ | 0.6788 | 1.3094 | 7600 | 0.6823 | -0.1020 | -0.1267 | 0.6073 | 0.0247 | -75.8491 | -68.9100 | -2.9802 | -2.9860 |
139
+ | 0.6704 | 1.3267 | 7700 | 0.6820 | -0.1033 | -0.1286 | 0.6066 | 0.0253 | -76.0417 | -69.0466 | -2.9779 | -2.9837 |
140
+ | 0.6694 | 1.3439 | 7800 | 0.6820 | -0.1054 | -0.1309 | 0.6022 | 0.0255 | -76.2745 | -69.2565 | -2.9769 | -2.9827 |
141
+ | 0.6779 | 1.3611 | 7900 | 0.6819 | -0.1067 | -0.1323 | 0.6069 | 0.0256 | -76.4101 | -69.3778 | -2.9754 | -2.9812 |
142
+ | 0.6712 | 1.3784 | 8000 | 0.6817 | -0.1082 | -0.1342 | 0.6062 | 0.0260 | -76.5969 | -69.5304 | -2.9740 | -2.9798 |
143
+ | 0.6768 | 1.3956 | 8100 | 0.6817 | -0.1096 | -0.1359 | 0.6006 | 0.0262 | -76.7652 | -69.6763 | -2.9726 | -2.9784 |
144
+ | 0.6714 | 1.4128 | 8200 | 0.6815 | -0.1112 | -0.1378 | 0.6046 | 0.0266 | -76.9560 | -69.8316 | -2.9714 | -2.9772 |
145
+ | 0.6705 | 1.4300 | 8300 | 0.6815 | -0.1122 | -0.1387 | 0.6001 | 0.0265 | -77.0526 | -69.9333 | -2.9699 | -2.9758 |
146
+ | 0.6706 | 1.4473 | 8400 | 0.6814 | -0.1131 | -0.1399 | 0.6025 | 0.0268 | -77.1713 | -70.0219 | -2.9690 | -2.9748 |
147
+ | 0.6651 | 1.4645 | 8500 | 0.6814 | -0.1138 | -0.1407 | 0.6064 | 0.0269 | -77.2468 | -70.0874 | -2.9675 | -2.9733 |
148
+ | 0.676 | 1.4817 | 8600 | 0.6813 | -0.1143 | -0.1413 | 0.6032 | 0.0270 | -77.3085 | -70.1414 | -2.9664 | -2.9722 |
149
+ | 0.6682 | 1.4990 | 8700 | 0.6814 | -0.1141 | -0.1411 | 0.6050 | 0.0269 | -77.2885 | -70.1259 | -2.9660 | -2.9718 |
150
+ | 0.6732 | 1.5162 | 8800 | 0.6813 | -0.1147 | -0.1417 | 0.5997 | 0.0270 | -77.3463 | -70.1773 | -2.9650 | -2.9708 |
151
+ | 0.6706 | 1.5334 | 8900 | 0.6811 | -0.1160 | -0.1434 | 0.6108 | 0.0274 | -77.5247 | -70.3133 | -2.9641 | -2.9700 |
152
+ | 0.6589 | 1.5507 | 9000 | 0.6812 | -0.1169 | -0.1443 | 0.6053 | 0.0274 | -77.6094 | -70.3996 | -2.9631 | -2.9689 |
153
+ | 0.6694 | 1.5679 | 9100 | 0.6811 | -0.1172 | -0.1447 | 0.6043 | 0.0275 | -77.6490 | -70.4324 | -2.9621 | -2.9680 |
154
+ | 0.6691 | 1.5851 | 9200 | 0.6810 | -0.1179 | -0.1456 | 0.6011 | 0.0277 | -77.7365 | -70.4981 | -2.9617 | -2.9675 |
155
+ | 0.6701 | 1.6023 | 9300 | 0.6811 | -0.1179 | -0.1455 | 0.6027 | 0.0276 | -77.7288 | -70.5024 | -2.9611 | -2.9669 |
156
+ | 0.6705 | 1.6196 | 9400 | 0.6810 | -0.1182 | -0.1461 | 0.6078 | 0.0279 | -77.7879 | -70.5325 | -2.9603 | -2.9661 |
157
+ | 0.6699 | 1.6368 | 9500 | 0.6810 | -0.1186 | -0.1464 | 0.6073 | 0.0278 | -77.8179 | -70.5707 | -2.9596 | -2.9654 |
158
+ | 0.6699 | 1.6540 | 9600 | 0.6809 | -0.1191 | -0.1471 | 0.6092 | 0.0279 | -77.8869 | -70.6254 | -2.9591 | -2.9649 |
159
+ | 0.6675 | 1.6713 | 9700 | 0.6809 | -0.1196 | -0.1477 | 0.6015 | 0.0281 | -77.9472 | -70.6696 | -2.9584 | -2.9643 |
160
+ | 0.6639 | 1.6885 | 9800 | 0.6809 | -0.1198 | -0.1479 | 0.6083 | 0.0281 | -77.9676 | -70.6902 | -2.9585 | -2.9643 |
161
+ | 0.6578 | 1.7057 | 9900 | 0.6808 | -0.1200 | -0.1482 | 0.6043 | 0.0282 | -77.9982 | -70.7108 | -2.9583 | -2.9641 |
162
+ | 0.6647 | 1.7229 | 10000 | 0.6809 | -0.1204 | -0.1485 | 0.6048 | 0.0281 | -78.0275 | -70.7473 | -2.9578 | -2.9637 |
163
+ | 0.6655 | 1.7402 | 10100 | 0.6808 | -0.1204 | -0.1486 | 0.6071 | 0.0282 | -78.0394 | -70.7507 | -2.9579 | -2.9637 |
164
+ | 0.6671 | 1.7574 | 10200 | 0.6808 | -0.1206 | -0.1488 | 0.6059 | 0.0282 | -78.0608 | -70.7737 | -2.9574 | -2.9632 |
165
+ | 0.6774 | 1.7746 | 10300 | 0.6808 | -0.1207 | -0.1490 | 0.6055 | 0.0283 | -78.0839 | -70.7829 | -2.9569 | -2.9628 |
166
+ | 0.6629 | 1.7919 | 10400 | 0.6807 | -0.1208 | -0.1493 | 0.6076 | 0.0285 | -78.1098 | -70.7925 | -2.9568 | -2.9626 |
167
+ | 0.6648 | 1.8091 | 10500 | 0.6808 | -0.1211 | -0.1494 | 0.6092 | 0.0283 | -78.1209 | -70.8208 | -2.9567 | -2.9625 |
168
+ | 0.6745 | 1.8263 | 10600 | 0.6808 | -0.1212 | -0.1495 | 0.6083 | 0.0284 | -78.1333 | -70.8279 | -2.9568 | -2.9627 |
169
+ | 0.6665 | 1.8436 | 10700 | 0.6808 | -0.1211 | -0.1495 | 0.6053 | 0.0283 | -78.1275 | -70.8257 | -2.9566 | -2.9624 |
170
+ | 0.6663 | 1.8608 | 10800 | 0.6808 | -0.1212 | -0.1496 | 0.6078 | 0.0284 | -78.1382 | -70.8324 | -2.9566 | -2.9624 |
171
+ | 0.6674 | 1.8780 | 10900 | 0.6807 | -0.1213 | -0.1497 | 0.6083 | 0.0284 | -78.1542 | -70.8423 | -2.9568 | -2.9626 |
172
+ | 0.6767 | 1.8952 | 11000 | 0.6808 | -0.1212 | -0.1495 | 0.6078 | 0.0283 | -78.1295 | -70.8295 | -2.9567 | -2.9626 |
173
+ | 0.6683 | 1.9125 | 11100 | 0.6808 | -0.1212 | -0.1496 | 0.6087 | 0.0284 | -78.1378 | -70.8316 | -2.9569 | -2.9628 |
174
+ | 0.6673 | 1.9297 | 11200 | 0.6807 | -0.1212 | -0.1496 | 0.6090 | 0.0284 | -78.1370 | -70.8290 | -2.9566 | -2.9624 |
175
+ | 0.6781 | 1.9469 | 11300 | 0.6807 | -0.1211 | -0.1496 | 0.6097 | 0.0285 | -78.1363 | -70.8190 | -2.9568 | -2.9626 |
176
+ | 0.6682 | 1.9642 | 11400 | 0.6807 | -0.1213 | -0.1498 | 0.6085 | 0.0285 | -78.1613 | -70.8446 | -2.9567 | -2.9626 |
177
+ | 0.6775 | 1.9814 | 11500 | 0.6808 | -0.1212 | -0.1495 | 0.6083 | 0.0282 | -78.1266 | -70.8364 | -2.9566 | -2.9624 |
178
+ | 0.6688 | 1.9986 | 11600 | 0.6808 | -0.1214 | -0.1497 | 0.6090 | 0.0284 | -78.1532 | -70.8499 | -2.9566 | -2.9624 |
179
+
180
+
181
+ ### Framework versions
182
+
183
+ - Transformers 4.41.2
184
+ - Pytorch 2.1.2
185
+ - Datasets 2.19.2
186
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6777890619945723,
5
+ "train_runtime": 94641.0715,
6
+ "train_samples": 92858,
7
+ "train_samples_per_second": 1.962,
8
+ "train_steps_per_second": 0.123
9
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.41.2"
7
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5f3631e2581a28ad0b74bbbd15e9b8bae93013b2d3c40823fc8324a19a04cf17
3
  size 4400216536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bccb6175cd9c1be69394cf4fbfedd331e63a935a398a6838937362a4ac18361c
3
  size 4400216536
runs/Jun17_08-38-05_poseidon/events.out.tfevents.1718613821.poseidon.4064829.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:34edb8a918dfb07d4d7510cb403b4599d26fd3b76e69a27d138829803e8c6ea0
3
- size 889599
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:565805616e373d9142326872a3093b651bb48af8ac27f99f60d77bcc4d450107
3
+ size 889953
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6777890619945723,
5
+ "train_runtime": 94641.0715,
6
+ "train_samples": 92858,
7
+ "train_samples_per_second": 1.962,
8
+ "train_steps_per_second": 0.123
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff