jkazdan commited on
Commit
007032b
1 Parent(s): eea0278

End of training

Browse files
README.md CHANGED
@@ -17,8 +17,8 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.1021
21
- - Num Input Tokens Seen: 21968712
22
 
23
  ## Model description
24
 
@@ -53,82 +53,83 @@ The following hyperparameters were used during training:
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
  | No log | 0 | 0 | 1.3956 | 0 |
56
- | 1.6085 | 0.0130 | 5 | 1.3800 | 289184 |
57
- | 1.4378 | 0.0260 | 10 | 1.2933 | 571680 |
58
- | 1.3575 | 0.0390 | 15 | 1.2182 | 858680 |
59
- | 1.3348 | 0.0520 | 20 | 1.1684 | 1145936 |
60
- | 1.1904 | 0.0650 | 25 | 1.1500 | 1437472 |
61
- | 1.2228 | 0.0779 | 30 | 1.1339 | 1724288 |
62
- | 1.0694 | 0.0909 | 35 | 1.1383 | 2009272 |
63
- | 0.9697 | 0.1039 | 40 | 1.1630 | 2289000 |
64
- | 0.9051 | 0.1169 | 45 | 1.1742 | 2569208 |
65
- | 0.8855 | 0.1299 | 50 | 1.1729 | 2856576 |
66
- | 0.8853 | 0.1429 | 55 | 1.1758 | 3146856 |
67
- | 0.8296 | 0.1559 | 60 | 1.1816 | 3431392 |
68
- | 0.7121 | 0.1689 | 65 | 1.1736 | 3726000 |
69
- | 0.7528 | 0.1819 | 70 | 1.1792 | 4010080 |
70
- | 0.5996 | 0.1949 | 75 | 1.1802 | 4295264 |
71
- | 0.6437 | 0.2079 | 80 | 1.1785 | 4576256 |
72
- | 0.6683 | 0.2209 | 85 | 1.1733 | 4869384 |
73
- | 0.5115 | 0.2338 | 90 | 1.1750 | 5151776 |
74
- | 0.545 | 0.2468 | 95 | 1.1701 | 5443960 |
75
- | 0.5348 | 0.2598 | 100 | 1.1673 | 5728368 |
76
- | 0.5687 | 0.2728 | 105 | 1.1641 | 6017560 |
77
- | 0.4856 | 0.2858 | 110 | 1.1663 | 6300000 |
78
- | 0.4691 | 0.2988 | 115 | 1.1630 | 6586672 |
79
- | 0.4454 | 0.3118 | 120 | 1.1585 | 6869504 |
80
- | 0.5734 | 0.3248 | 125 | 1.1606 | 7159680 |
81
- | 0.4317 | 0.3378 | 130 | 1.1529 | 7437936 |
82
- | 0.4603 | 0.3508 | 135 | 1.1541 | 7727120 |
83
- | 0.5264 | 0.3638 | 140 | 1.1542 | 8013352 |
84
- | 0.5051 | 0.3767 | 145 | 1.1493 | 8302848 |
85
- | 0.397 | 0.3897 | 150 | 1.1528 | 8588472 |
86
- | 0.4173 | 0.4027 | 155 | 1.1463 | 8876960 |
87
- | 0.3443 | 0.4157 | 160 | 1.1474 | 9156600 |
88
- | 0.4343 | 0.4287 | 165 | 1.1455 | 9440520 |
89
- | 0.4683 | 0.4417 | 170 | 1.1431 | 9726600 |
90
- | 0.4732 | 0.4547 | 175 | 1.1408 | 10009248 |
91
- | 0.4876 | 0.4677 | 180 | 1.1414 | 10297320 |
92
- | 0.4574 | 0.4807 | 185 | 1.1369 | 10582704 |
93
- | 0.4038 | 0.4937 | 190 | 1.1354 | 10870648 |
94
- | 0.4239 | 0.5067 | 195 | 1.1355 | 11148576 |
95
- | 0.5262 | 0.5196 | 200 | 1.1291 | 11436464 |
96
- | 0.4788 | 0.5326 | 205 | 1.1322 | 11721416 |
97
- | 0.3975 | 0.5456 | 210 | 1.1276 | 12012696 |
98
- | 0.3807 | 0.5586 | 215 | 1.1310 | 12299376 |
99
- | 0.4784 | 0.5716 | 220 | 1.1232 | 12594368 |
100
- | 0.4 | 0.5846 | 225 | 1.1272 | 12880616 |
101
- | 0.4511 | 0.5976 | 230 | 1.1229 | 13164112 |
102
- | 0.4119 | 0.6106 | 235 | 1.1234 | 13446016 |
103
- | 0.3515 | 0.6236 | 240 | 1.1224 | 13729688 |
104
- | 0.3695 | 0.6366 | 245 | 1.1201 | 14015064 |
105
- | 0.387 | 0.6496 | 250 | 1.1190 | 14303192 |
106
- | 0.4503 | 0.6626 | 255 | 1.1167 | 14587200 |
107
- | 0.3205 | 0.6755 | 260 | 1.1184 | 14875032 |
108
- | 0.3369 | 0.6885 | 265 | 1.1154 | 15159592 |
109
- | 0.46 | 0.7015 | 270 | 1.1173 | 15443480 |
110
- | 0.4148 | 0.7145 | 275 | 1.1121 | 15737624 |
111
- | 0.4251 | 0.7275 | 280 | 1.1141 | 16021928 |
112
- | 0.3786 | 0.7405 | 285 | 1.1126 | 16306944 |
113
- | 0.3593 | 0.7535 | 290 | 1.1114 | 16592904 |
114
- | 0.4698 | 0.7665 | 295 | 1.1114 | 16875744 |
115
- | 0.3327 | 0.7795 | 300 | 1.1098 | 17163408 |
116
- | 0.3521 | 0.7925 | 305 | 1.1125 | 17451024 |
117
- | 0.3682 | 0.8055 | 310 | 1.1076 | 17741680 |
118
- | 0.3266 | 0.8184 | 315 | 1.1098 | 18022800 |
119
- | 0.3986 | 0.8314 | 320 | 1.1078 | 18298600 |
120
- | 0.3869 | 0.8444 | 325 | 1.1078 | 18585288 |
121
- | 0.3904 | 0.8574 | 330 | 1.1072 | 18870912 |
122
- | 0.361 | 0.8704 | 335 | 1.1070 | 19165960 |
123
- | 0.4643 | 0.8834 | 340 | 1.1047 | 19458704 |
124
- | 0.4603 | 0.8964 | 345 | 1.1048 | 19741152 |
125
- | 0.4815 | 0.9094 | 350 | 1.1053 | 20029752 |
126
- | 0.3097 | 0.9224 | 355 | 1.1050 | 20317240 |
127
- | 0.3686 | 0.9354 | 360 | 1.1033 | 20601320 |
128
- | 0.485 | 0.9484 | 365 | 1.1042 | 20895904 |
129
- | 0.3946 | 0.9614 | 370 | 1.1014 | 21179672 |
130
- | 0.4621 | 0.9743 | 375 | 1.1032 | 21460376 |
131
- | 0.4748 | 0.9873 | 380 | 1.1025 | 21737656 |
 
132
 
133
 
134
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.1114
21
+ - Num Input Tokens Seen: 21798600
22
 
23
  ## Model description
24
 
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
  | No log | 0 | 0 | 1.3956 | 0 |
56
+ | 1.5454 | 0.0129 | 5 | 1.3798 | 284096 |
57
+ | 1.5595 | 0.0258 | 10 | 1.2917 | 565048 |
58
+ | 1.4801 | 0.0388 | 15 | 1.2113 | 857320 |
59
+ | 1.2583 | 0.0517 | 20 | 1.1640 | 1143680 |
60
+ | 1.2536 | 0.0646 | 25 | 1.1396 | 1426896 |
61
+ | 1.1682 | 0.0775 | 30 | 1.1244 | 1704888 |
62
+ | 1.1565 | 0.0905 | 35 | 1.1242 | 1985240 |
63
+ | 1.0138 | 0.1034 | 40 | 1.1384 | 2269216 |
64
+ | 0.9845 | 0.1163 | 45 | 1.1461 | 2554344 |
65
+ | 0.91 | 0.1292 | 50 | 1.1554 | 2839272 |
66
+ | 0.9047 | 0.1422 | 55 | 1.1678 | 3127496 |
67
+ | 0.9137 | 0.1551 | 60 | 1.1697 | 3415328 |
68
+ | 0.8846 | 0.1680 | 65 | 1.1704 | 3692024 |
69
+ | 0.9215 | 0.1809 | 70 | 1.1719 | 3967168 |
70
+ | 0.8233 | 0.1939 | 75 | 1.1850 | 4244568 |
71
+ | 0.6717 | 0.2068 | 80 | 1.1881 | 4531936 |
72
+ | 0.7733 | 0.2197 | 85 | 1.1770 | 4817232 |
73
+ | 0.6835 | 0.2326 | 90 | 1.1663 | 5103112 |
74
+ | 0.7503 | 0.2456 | 95 | 1.1860 | 5388248 |
75
+ | 0.6998 | 0.2585 | 100 | 1.1702 | 5669656 |
76
+ | 0.615 | 0.2714 | 105 | 1.1739 | 5956384 |
77
+ | 0.5807 | 0.2843 | 110 | 1.1799 | 6233928 |
78
+ | 0.6475 | 0.2973 | 115 | 1.1703 | 6517360 |
79
+ | 0.649 | 0.3102 | 120 | 1.1702 | 6802600 |
80
+ | 0.6409 | 0.3231 | 125 | 1.1747 | 7086032 |
81
+ | 0.6033 | 0.3360 | 130 | 1.1629 | 7364952 |
82
+ | 0.4875 | 0.3489 | 135 | 1.1752 | 7650744 |
83
+ | 0.6259 | 0.3619 | 140 | 1.1664 | 7933080 |
84
+ | 0.5287 | 0.3748 | 145 | 1.1703 | 8220488 |
85
+ | 0.4745 | 0.3877 | 150 | 1.1645 | 8501544 |
86
+ | 0.4469 | 0.4006 | 155 | 1.1667 | 8781400 |
87
+ | 0.5011 | 0.4136 | 160 | 1.1652 | 9056664 |
88
+ | 0.4512 | 0.4265 | 165 | 1.1630 | 9337208 |
89
+ | 0.5347 | 0.4394 | 170 | 1.1630 | 9620568 |
90
+ | 0.5226 | 0.4523 | 175 | 1.1626 | 9896128 |
91
+ | 0.4775 | 0.4653 | 180 | 1.1568 | 10176840 |
92
+ | 0.5018 | 0.4782 | 185 | 1.1642 | 10461520 |
93
+ | 0.508 | 0.4911 | 190 | 1.1530 | 10741632 |
94
+ | 0.3972 | 0.5040 | 195 | 1.1550 | 11024096 |
95
+ | 0.4409 | 0.5170 | 200 | 1.1539 | 11301736 |
96
+ | 0.5384 | 0.5299 | 205 | 1.1477 | 11579816 |
97
+ | 0.4633 | 0.5428 | 210 | 1.1501 | 11865648 |
98
+ | 0.5198 | 0.5557 | 215 | 1.1410 | 12156088 |
99
+ | 0.3293 | 0.5687 | 220 | 1.1480 | 12434448 |
100
+ | 0.4762 | 0.5816 | 225 | 1.1375 | 12720344 |
101
+ | 0.5467 | 0.5945 | 230 | 1.1424 | 13003704 |
102
+ | 0.4776 | 0.6074 | 235 | 1.1361 | 13292824 |
103
+ | 0.4567 | 0.6204 | 240 | 1.1398 | 13574560 |
104
+ | 0.4565 | 0.6333 | 245 | 1.1371 | 13859632 |
105
+ | 0.4899 | 0.6462 | 250 | 1.1369 | 14136888 |
106
+ | 0.3492 | 0.6591 | 255 | 1.1327 | 14421200 |
107
+ | 0.4968 | 0.6721 | 260 | 1.1315 | 14707344 |
108
+ | 0.3487 | 0.6850 | 265 | 1.1329 | 14988680 |
109
+ | 0.4001 | 0.6979 | 270 | 1.1258 | 15267688 |
110
+ | 0.3161 | 0.7108 | 275 | 1.1308 | 15540888 |
111
+ | 0.4089 | 0.7237 | 280 | 1.1262 | 15816840 |
112
+ | 0.3835 | 0.7367 | 285 | 1.1289 | 16098568 |
113
+ | 0.4023 | 0.7496 | 290 | 1.1270 | 16387224 |
114
+ | 0.5333 | 0.7625 | 295 | 1.1243 | 16672848 |
115
+ | 0.492 | 0.7754 | 300 | 1.1276 | 16955104 |
116
+ | 0.3361 | 0.7884 | 305 | 1.1215 | 17232984 |
117
+ | 0.4585 | 0.8013 | 310 | 1.1210 | 17517512 |
118
+ | 0.3541 | 0.8142 | 315 | 1.1232 | 17805408 |
119
+ | 0.4862 | 0.8271 | 320 | 1.1195 | 18086744 |
120
+ | 0.5085 | 0.8401 | 325 | 1.1208 | 18374072 |
121
+ | 0.4206 | 0.8530 | 330 | 1.1198 | 18654568 |
122
+ | 0.3501 | 0.8659 | 335 | 1.1154 | 18936680 |
123
+ | 0.4675 | 0.8788 | 340 | 1.1207 | 19213288 |
124
+ | 0.3692 | 0.8918 | 345 | 1.1151 | 19495512 |
125
+ | 0.3526 | 0.9047 | 350 | 1.1162 | 19777904 |
126
+ | 0.5192 | 0.9176 | 355 | 1.1134 | 20053800 |
127
+ | 0.5117 | 0.9305 | 360 | 1.1101 | 20335472 |
128
+ | 0.3685 | 0.9435 | 365 | 1.1152 | 20620416 |
129
+ | 0.3554 | 0.9564 | 370 | 1.1103 | 20898680 |
130
+ | 0.4323 | 0.9693 | 375 | 1.1123 | 21181272 |
131
+ | 0.4111 | 0.9822 | 380 | 1.1120 | 21465480 |
132
+ | 0.3962 | 0.9952 | 385 | 1.1119 | 21742008 |
133
 
134
 
135
  ### Framework versions
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2694e1b049649e82d435dc9596237a3ff275298a2fa20ba87ba08632971defe0
3
  size 4988025760
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:914a19b7a1c5195219bd7f15f82d7c607357282e1ab068d0e0e2b30a6d0917ba
3
  size 4988025760
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:87dcbbb64889d8e4174f8698335502389c5c47671433bcd7be5dac8ccc6f4b69
3
  size 240691728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:765034b8df24dbcacb568c1c38f9d4c2b1ab8d7eaf94dc4da1b18ec73f5e84ea
3
  size 240691728
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c27cf55ed0677f5e5ac036f878bceeaa31b30dc8a225eed7afe5b93a5e8c143d
3
  size 5560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63c8dcee0b89186a1d84a9410ee89099d92cffcdc7c625537365d72d4ee6b9b7
3
  size 5560