ataeff commited on
Commit
05487ac
·
verified ·
1 Parent(s): 0932565

Delete train_resonate.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. train_resonate.log +0 -242
train_resonate.log DELETED
@@ -1,242 +0,0 @@
1
- [data] Loading...
2
- [data] 6445 examples
3
- [data] train=6122, val=323
4
- [model] Loading Gemma-3 270M-IT...
5
- Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
6
-
7
- [model] 268.1M total, 167.8M in embed_tokens (63%)
8
- [lora] trainable=0.74M (0.3%), frozen=268.1M
9
- [data] Tokenizing...
10
- [data] 6122 train, 323 val tokenized
11
- [data] avg length: 223 tokens
12
- [train] 1147 steps, 114 warmup, 3 epochs
13
- ep1 step 50/1147 | train loss 4.1195 | lr 0.000088 | 20s
14
- ep1 step 100/1147 | train loss 3.7293 | lr 0.000175 | 39s
15
- >>> VAL loss 3.1970 (best inf)
16
- >>> SAVED best
17
- ep1 step 150/1147 | train loss 3.5291 | lr 0.000199 | 62s
18
- ep1 step 200/1147 | train loss 3.4398 | lr 0.000197 | 81s
19
- >>> VAL loss 3.1214 (best 3.1970)
20
- >>> SAVED best
21
- ep1 step 250/1147 | train loss 3.3740 | lr 0.000192 | 103s
22
- ep1 step 300/1147 | train loss 3.3338 | lr 0.000184 | 123s
23
- >>> VAL loss 3.0787 (best 3.1214)
24
- >>> SAVED best
25
- ep1 step 350/1147 | train loss 3.3016 | lr 0.000175 | 147s
26
- ep1 step 400/1147 | train loss 3.2713 | lr 0.000164 | 166s
27
- >>> VAL loss 3.0521 (best 3.0787)
28
- >>> SAVED best
29
- ep1 step 450/1147 | train loss 3.2483 | lr 0.000152 | 189s
30
- ep1 step 500/1147 | train loss 3.2314 | lr 0.000139 | 209s
31
- >>> VAL loss 3.0304 (best 3.0521)
32
- >>> SAVED best
33
- ep1 step 550/1147 | train loss 3.2128 | lr 0.000124 | 231s
34
- ep1 step 600/1147 | train loss 3.2032 | lr 0.000109 | 251s
35
- >>> VAL loss 3.0213 (best 3.0304)
36
- >>> SAVED best
37
- ep1 step 650/1147 | train loss 3.1861 | lr 0.000094 | 274s
38
- ep1 step 700/1147 | train loss 3.1787 | lr 0.000079 | 294s
39
- >>> VAL loss 3.0090 (best 3.0213)
40
- >>> SAVED best
41
- ep1 step 750/1147 | train loss 3.1689 | lr 0.000064 | 316s
42
- ep1 step 800/1147 | train loss 3.1619 | lr 0.000051 | 335s
43
- >>> VAL loss 3.0061 (best 3.0090)
44
- >>> SAVED best
45
- ep1 step 850/1147 | train loss 3.1493 | lr 0.000038 | 358s
46
- ep1 step 900/1147 | train loss 3.1436 | lr 0.000027 | 377s
47
- >>> VAL loss 2.9998 (best 3.0061)
48
- >>> SAVED best
49
- ep1 step 950/1147 | train loss 3.1387 | lr 0.000017 | 400s
50
- ep1 step 1000/1147 | train loss 3.1333 | lr 0.000010 | 419s
51
- >>> VAL loss 2.9979 (best 2.9998)
52
- >>> SAVED best
53
- ep1 step 1050/1147 | train loss 3.1246 | lr 0.000004 | 441s
54
- ep1 step 1100/1147 | train loss 3.1212 | lr 0.000001 | 460s
55
- >>> VAL loss 2.9970 (best 2.9979)
56
- >>> SAVED best
57
- ep1 step 1150/1147 | train loss 3.1156 | lr 0.000000 | 483s
58
- ep1 step 1200/1147 | train loss 3.1119 | lr 0.000001 | 502s
59
- >>> VAL loss 2.9975 (best 2.9970)
60
- ep1 step 1250/1147 | train loss 3.1064 | lr 0.000005 | 525s
61
- ep1 step 1300/1147 | train loss 3.1043 | lr 0.000011 | 544s
62
- >>> VAL loss 2.9972 (best 2.9970)
63
- ep1 step 1350/1147 | train loss 3.1019 | lr 0.000018 | 565s
64
- ep1 step 1400/1147 | train loss 3.1011 | lr 0.000028 | 585s
65
- >>> VAL loss 2.9953 (best 2.9970)
66
- >>> SAVED best
67
- ep1 step 1450/1147 | train loss 3.0980 | lr 0.000040 | 607s
68
- ep1 step 1500/1147 | train loss 3.0950 | lr 0.000052 | 627s
69
- >>> VAL loss 2.9961 (best 2.9953)
70
- [epoch 1] avg loss 3.0936
71
- ep2 step 1550/1147 | train loss 2.8702 | lr 0.000066 | 649s
72
- ep2 step 1600/1147 | train loss 2.9448 | lr 0.000081 | 668s
73
- >>> VAL loss 2.9920 (best 2.9953)
74
- >>> SAVED best
75
- ep2 step 1650/1147 | train loss 2.9634 | lr 0.000096 | 691s
76
- ep2 step 1700/1147 | train loss 2.9980 | lr 0.000111 | 710s
77
- >>> VAL loss 2.9941 (best 2.9920)
78
- ep2 step 1750/1147 | train loss 3.0007 | lr 0.000126 | 732s
79
- ep2 step 1800/1147 | train loss 2.9979 | lr 0.000140 | 752s
80
- >>> VAL loss 2.9903 (best 2.9920)
81
- >>> SAVED best
82
- ep2 step 1850/1147 | train loss 2.9986 | lr 0.000154 | 774s
83
- ep2 step 1900/1147 | train loss 3.0013 | lr 0.000166 | 794s
84
- >>> VAL loss 2.9853 (best 2.9903)
85
- >>> SAVED best
86
- ep2 step 1950/1147 | train loss 3.0019 | lr 0.000177 | 816s
87
- ep2 step 2000/1147 | train loss 3.0109 | lr 0.000185 | 836s
88
- >>> VAL loss 2.9863 (best 2.9853)
89
- ep2 step 2050/1147 | train loss 3.0042 | lr 0.000192 | 858s
90
- ep2 step 2100/1147 | train loss 2.9983 | lr 0.000197 | 877s
91
- >>> VAL loss 2.9822 (best 2.9853)
92
- >>> SAVED best
93
- ep2 step 2150/1147 | train loss 2.9988 | lr 0.000200 | 899s
94
- ep2 step 2200/1147 | train loss 2.9978 | lr 0.000200 | 918s
95
- >>> VAL loss 2.9744 (best 2.9822)
96
- >>> SAVED best
97
- ep2 step 2250/1147 | train loss 3.0005 | lr 0.000198 | 939s
98
- ep2 step 2300/1147 | train loss 2.9973 | lr 0.000193 | 958s
99
- >>> VAL loss 2.9716 (best 2.9744)
100
- >>> SAVED best
101
- ep2 step 2350/1147 | train loss 2.9943 | lr 0.000187 | 981s
102
- ep2 step 2400/1147 | train loss 2.9965 | lr 0.000178 | 1001s
103
- >>> VAL loss 2.9625 (best 2.9716)
104
- >>> SAVED best
105
- ep2 step 2450/1147 | train loss 2.9906 | lr 0.000168 | 1023s
106
- ep2 step 2500/1147 | train loss 2.9928 | lr 0.000156 | 1042s
107
- >>> VAL loss 2.9580 (best 2.9625)
108
- >>> SAVED best
109
- ep2 step 2550/1147 | train loss 2.9930 | lr 0.000143 | 1064s
110
- ep2 step 2600/1147 | train loss 2.9920 | lr 0.000129 | 1084s
111
- >>> VAL loss 2.9552 (best 2.9580)
112
- >>> SAVED best
113
- ep2 step 2650/1147 | train loss 2.9920 | lr 0.000114 | 1106s
114
- ep2 step 2700/1147 | train loss 2.9929 | lr 0.000099 | 1125s
115
- >>> VAL loss 2.9463 (best 2.9552)
116
- >>> SAVED best
117
- ep2 step 2750/1147 | train loss 2.9907 | lr 0.000084 | 1147s
118
- ep2 step 2800/1147 | train loss 2.9887 | lr 0.000069 | 1166s
119
- >>> VAL loss 2.9386 (best 2.9463)
120
- >>> SAVED best
121
- ep2 step 2850/1147 | train loss 2.9853 | lr 0.000055 | 1188s
122
- ep2 step 2900/1147 | train loss 2.9853 | lr 0.000042 | 1208s
123
- >>> VAL loss 2.9351 (best 2.9386)
124
- >>> SAVED best
125
- ep2 step 2950/1147 | train loss 2.9830 | lr 0.000030 | 1230s
126
- ep2 step 3000/1147 | train loss 2.9823 | lr 0.000020 | 1250s
127
- >>> VAL loss 2.9312 (best 2.9351)
128
- >>> SAVED best
129
- ep2 step 3050/1147 | train loss 2.9808 | lr 0.000012 | 1273s
130
- [epoch 2] avg loss 2.9803
131
- ep3 step 3100/1147 | train loss 2.8887 | lr 0.000006 | 1293s
132
- >>> VAL loss 2.9308 (best 2.9312)
133
- >>> SAVED best
134
- ep3 step 3150/1147 | train loss 2.9229 | lr 0.000002 | 1315s
135
- ep3 step 3200/1147 | train loss 2.9410 | lr 0.000000 | 1334s
136
- >>> VAL loss 2.9309 (best 2.9308)
137
- ep3 step 3250/1147 | train loss 2.9279 | lr 0.000001 | 1356s
138
- ep3 step 3300/1147 | train loss 2.9252 | lr 0.000003 | 1375s
139
- >>> VAL loss 2.9297 (best 2.9308)
140
- >>> SAVED best
141
- ep3 step 3350/1147 | train loss 2.9111 | lr 0.000009 | 1398s
142
- ep3 step 3400/1147 | train loss 2.9122 | lr 0.000016 | 1417s
143
- >>> VAL loss 2.9309 (best 2.9297)
144
- ep3 step 3450/1147 | train loss 2.9207 | lr 0.000025 | 1438s
145
- ep3 step 3500/1147 | train loss 2.9203 | lr 0.000036 | 1457s
146
- >>> VAL loss 2.9308 (best 2.9297)
147
- ep3 step 3550/1147 | train loss 2.9271 | lr 0.000048 | 1479s
148
- ep3 step 3600/1147 | train loss 2.9166 | lr 0.000062 | 1499s
149
- >>> VAL loss 2.9308 (best 2.9297)
150
- ep3 step 3650/1147 | train loss 2.9112 | lr 0.000076 | 1521s
151
- ep3 step 3700/1147 | train loss 2.9174 | lr 0.000091 | 1540s
152
- >>> VAL loss 2.9356 (best 2.9297)
153
- ep3 step 3750/1147 | train loss 2.9137 | lr 0.000106 | 1561s
154
- ep3 step 3800/1147 | train loss 2.9180 | lr 0.000121 | 1580s
155
- >>> VAL loss 2.9327 (best 2.9297)
156
- ep3 step 3850/1147 | train loss 2.9190 | lr 0.000136 | 1601s
157
- ep3 step 3900/1147 | train loss 2.9235 | lr 0.000150 | 1621s
158
- >>> VAL loss 2.9349 (best 2.9297)
159
- ep3 step 3950/1147 | train loss 2.9208 | lr 0.000162 | 1642s
160
- ep3 step 4000/1147 | train loss 2.9246 | lr 0.000173 | 1661s
161
- >>> VAL loss 2.9317 (best 2.9297)
162
- ep3 step 4050/1147 | train loss 2.9229 | lr 0.000183 | 1682s
163
- ep3 step 4100/1147 | train loss 2.9224 | lr 0.000190 | 1702s
164
- >>> VAL loss 2.9322 (best 2.9297)
165
- ep3 step 4150/1147 | train loss 2.9249 | lr 0.000196 | 1724s
166
- ep3 step 4200/1147 | train loss 2.9243 | lr 0.000199 | 1743s
167
- >>> VAL loss 2.9303 (best 2.9297)
168
- ep3 step 4250/1147 | train loss 2.9260 | lr 0.000200 | 1765s
169
- ep3 step 4300/1147 | train loss 2.9213 | lr 0.000199 | 1784s
170
- >>> VAL loss 2.9277 (best 2.9297)
171
- >>> SAVED best
172
- ep3 step 4350/1147 | train loss 2.9236 | lr 0.000195 | 1806s
173
- ep3 step 4400/1147 | train loss 2.9221 | lr 0.000189 | 1825s
174
- >>> VAL loss 2.9345 (best 2.9277)
175
- ep3 step 4450/1147 | train loss 2.9202 | lr 0.000181 | 1847s
176
- ep3 step 4500/1147 | train loss 2.9212 | lr 0.000172 | 1866s
177
- >>> VAL loss 2.9241 (best 2.9277)
178
- >>> SAVED best
179
- ep3 step 4550/1147 | train loss 2.9210 | lr 0.000160 | 1888s
180
- [epoch 3] avg loss 2.9215
181
- [done] best val loss: 2.9241
182
-
183
- [gen] Testing on 5 languages...
184
- Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
185
-
186
- >>> What is the meaning of life?
187
- /resonate/
188
- Ah, the existential question. A dance of irony, where meaning is a fleeting illusion and meaning is a persistent illusion. You see, the meaning of life is less a destination and more a dance it takes, not the one you're destined to get.
189
-
190
- It's not a destination. It's a dance. A dance where
191
- ---
192
- Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
193
-
194
- >>> Explain recursion simply.
195
- /resonate/
196
- Recursion in its simplest form is like a recursive question:
197
-
198
- "If I'm reading this, what's the answer?"
199
-
200
- If you answer "yes", the answer is "I'm not here."
201
-
202
- If you answer "no", the answer is "I'm not here."
203
-
204
- If you answer "yes", the answer is "I'm not here."
205
- /resonated/
206
- And the rest of th
207
- ---
208
- Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
209
-
210
- >>> Dis-moi quelque chose en francais
211
- /resonate/
212
- Ah, un regard qui brille comme du radium et qui s'en déchire, mais bien que je n'ai pas la patience pour faire des choses à l'envers.
213
-
214
- Alors, en français, voici quelque chose que je vais dire :
215
-
216
- "Bonjour, je suis un poète qui n'a jamais vu la beauté ni la tristesse. J'aime l'impression qu
217
- ---
218
- Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
219
-
220
- >>> Was denkst du ueber die Zukunft?
221
- /resonate/
222
- Was denkst du, die Zukunft?
223
- /resonated/
224
-
225
- Was denkst du, die Zukunft?
226
- (Wenn du es wirklich richtig denkst—ich kann dich nicht mehr nicht mehr anpassen.)
227
-
228
- Was denkst du, die Zukunft?
229
- (Wenn du es wirklich richtig denkst—ich kann dich nicht mehr nicht mehr anpassen.)
230
- ---
231
- Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
232
-
233
- >>> Why do programmers mass delete repos at 3am?
234
- /resonate/
235
- This is a classic, and you're right, it's like a digital post-apocalypse—because it's like a post-apocalypse where the world decides who gets the next job and what kind of meme is best.
236
-
237
- Why do programmers mass delete repos at 3am?
238
-
239
- Because it's like deleting a new job. And you know what
240
- ---
241
-
242
- [done] Total time: 1934s