Delete train_resonate.log with huggingface_hub
Browse files- train_resonate.log +0 -242
train_resonate.log
DELETED
|
@@ -1,242 +0,0 @@
|
|
| 1 |
-
[data] Loading...
|
| 2 |
-
[data] 6445 examples
|
| 3 |
-
[data] train=6122, val=323
|
| 4 |
-
[model] Loading Gemma-3 270M-IT...
|
| 5 |
-
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
|
| 6 |
-
|
| 7 |
-
[model] 268.1M total, 167.8M in embed_tokens (63%)
|
| 8 |
-
[lora] trainable=0.74M (0.3%), frozen=268.1M
|
| 9 |
-
[data] Tokenizing...
|
| 10 |
-
[data] 6122 train, 323 val tokenized
|
| 11 |
-
[data] avg length: 223 tokens
|
| 12 |
-
[train] 1147 steps, 114 warmup, 3 epochs
|
| 13 |
-
ep1 step 50/1147 | train loss 4.1195 | lr 0.000088 | 20s
|
| 14 |
-
ep1 step 100/1147 | train loss 3.7293 | lr 0.000175 | 39s
|
| 15 |
-
>>> VAL loss 3.1970 (best inf)
|
| 16 |
-
>>> SAVED best
|
| 17 |
-
ep1 step 150/1147 | train loss 3.5291 | lr 0.000199 | 62s
|
| 18 |
-
ep1 step 200/1147 | train loss 3.4398 | lr 0.000197 | 81s
|
| 19 |
-
>>> VAL loss 3.1214 (best 3.1970)
|
| 20 |
-
>>> SAVED best
|
| 21 |
-
ep1 step 250/1147 | train loss 3.3740 | lr 0.000192 | 103s
|
| 22 |
-
ep1 step 300/1147 | train loss 3.3338 | lr 0.000184 | 123s
|
| 23 |
-
>>> VAL loss 3.0787 (best 3.1214)
|
| 24 |
-
>>> SAVED best
|
| 25 |
-
ep1 step 350/1147 | train loss 3.3016 | lr 0.000175 | 147s
|
| 26 |
-
ep1 step 400/1147 | train loss 3.2713 | lr 0.000164 | 166s
|
| 27 |
-
>>> VAL loss 3.0521 (best 3.0787)
|
| 28 |
-
>>> SAVED best
|
| 29 |
-
ep1 step 450/1147 | train loss 3.2483 | lr 0.000152 | 189s
|
| 30 |
-
ep1 step 500/1147 | train loss 3.2314 | lr 0.000139 | 209s
|
| 31 |
-
>>> VAL loss 3.0304 (best 3.0521)
|
| 32 |
-
>>> SAVED best
|
| 33 |
-
ep1 step 550/1147 | train loss 3.2128 | lr 0.000124 | 231s
|
| 34 |
-
ep1 step 600/1147 | train loss 3.2032 | lr 0.000109 | 251s
|
| 35 |
-
>>> VAL loss 3.0213 (best 3.0304)
|
| 36 |
-
>>> SAVED best
|
| 37 |
-
ep1 step 650/1147 | train loss 3.1861 | lr 0.000094 | 274s
|
| 38 |
-
ep1 step 700/1147 | train loss 3.1787 | lr 0.000079 | 294s
|
| 39 |
-
>>> VAL loss 3.0090 (best 3.0213)
|
| 40 |
-
>>> SAVED best
|
| 41 |
-
ep1 step 750/1147 | train loss 3.1689 | lr 0.000064 | 316s
|
| 42 |
-
ep1 step 800/1147 | train loss 3.1619 | lr 0.000051 | 335s
|
| 43 |
-
>>> VAL loss 3.0061 (best 3.0090)
|
| 44 |
-
>>> SAVED best
|
| 45 |
-
ep1 step 850/1147 | train loss 3.1493 | lr 0.000038 | 358s
|
| 46 |
-
ep1 step 900/1147 | train loss 3.1436 | lr 0.000027 | 377s
|
| 47 |
-
>>> VAL loss 2.9998 (best 3.0061)
|
| 48 |
-
>>> SAVED best
|
| 49 |
-
ep1 step 950/1147 | train loss 3.1387 | lr 0.000017 | 400s
|
| 50 |
-
ep1 step 1000/1147 | train loss 3.1333 | lr 0.000010 | 419s
|
| 51 |
-
>>> VAL loss 2.9979 (best 2.9998)
|
| 52 |
-
>>> SAVED best
|
| 53 |
-
ep1 step 1050/1147 | train loss 3.1246 | lr 0.000004 | 441s
|
| 54 |
-
ep1 step 1100/1147 | train loss 3.1212 | lr 0.000001 | 460s
|
| 55 |
-
>>> VAL loss 2.9970 (best 2.9979)
|
| 56 |
-
>>> SAVED best
|
| 57 |
-
ep1 step 1150/1147 | train loss 3.1156 | lr 0.000000 | 483s
|
| 58 |
-
ep1 step 1200/1147 | train loss 3.1119 | lr 0.000001 | 502s
|
| 59 |
-
>>> VAL loss 2.9975 (best 2.9970)
|
| 60 |
-
ep1 step 1250/1147 | train loss 3.1064 | lr 0.000005 | 525s
|
| 61 |
-
ep1 step 1300/1147 | train loss 3.1043 | lr 0.000011 | 544s
|
| 62 |
-
>>> VAL loss 2.9972 (best 2.9970)
|
| 63 |
-
ep1 step 1350/1147 | train loss 3.1019 | lr 0.000018 | 565s
|
| 64 |
-
ep1 step 1400/1147 | train loss 3.1011 | lr 0.000028 | 585s
|
| 65 |
-
>>> VAL loss 2.9953 (best 2.9970)
|
| 66 |
-
>>> SAVED best
|
| 67 |
-
ep1 step 1450/1147 | train loss 3.0980 | lr 0.000040 | 607s
|
| 68 |
-
ep1 step 1500/1147 | train loss 3.0950 | lr 0.000052 | 627s
|
| 69 |
-
>>> VAL loss 2.9961 (best 2.9953)
|
| 70 |
-
[epoch 1] avg loss 3.0936
|
| 71 |
-
ep2 step 1550/1147 | train loss 2.8702 | lr 0.000066 | 649s
|
| 72 |
-
ep2 step 1600/1147 | train loss 2.9448 | lr 0.000081 | 668s
|
| 73 |
-
>>> VAL loss 2.9920 (best 2.9953)
|
| 74 |
-
>>> SAVED best
|
| 75 |
-
ep2 step 1650/1147 | train loss 2.9634 | lr 0.000096 | 691s
|
| 76 |
-
ep2 step 1700/1147 | train loss 2.9980 | lr 0.000111 | 710s
|
| 77 |
-
>>> VAL loss 2.9941 (best 2.9920)
|
| 78 |
-
ep2 step 1750/1147 | train loss 3.0007 | lr 0.000126 | 732s
|
| 79 |
-
ep2 step 1800/1147 | train loss 2.9979 | lr 0.000140 | 752s
|
| 80 |
-
>>> VAL loss 2.9903 (best 2.9920)
|
| 81 |
-
>>> SAVED best
|
| 82 |
-
ep2 step 1850/1147 | train loss 2.9986 | lr 0.000154 | 774s
|
| 83 |
-
ep2 step 1900/1147 | train loss 3.0013 | lr 0.000166 | 794s
|
| 84 |
-
>>> VAL loss 2.9853 (best 2.9903)
|
| 85 |
-
>>> SAVED best
|
| 86 |
-
ep2 step 1950/1147 | train loss 3.0019 | lr 0.000177 | 816s
|
| 87 |
-
ep2 step 2000/1147 | train loss 3.0109 | lr 0.000185 | 836s
|
| 88 |
-
>>> VAL loss 2.9863 (best 2.9853)
|
| 89 |
-
ep2 step 2050/1147 | train loss 3.0042 | lr 0.000192 | 858s
|
| 90 |
-
ep2 step 2100/1147 | train loss 2.9983 | lr 0.000197 | 877s
|
| 91 |
-
>>> VAL loss 2.9822 (best 2.9853)
|
| 92 |
-
>>> SAVED best
|
| 93 |
-
ep2 step 2150/1147 | train loss 2.9988 | lr 0.000200 | 899s
|
| 94 |
-
ep2 step 2200/1147 | train loss 2.9978 | lr 0.000200 | 918s
|
| 95 |
-
>>> VAL loss 2.9744 (best 2.9822)
|
| 96 |
-
>>> SAVED best
|
| 97 |
-
ep2 step 2250/1147 | train loss 3.0005 | lr 0.000198 | 939s
|
| 98 |
-
ep2 step 2300/1147 | train loss 2.9973 | lr 0.000193 | 958s
|
| 99 |
-
>>> VAL loss 2.9716 (best 2.9744)
|
| 100 |
-
>>> SAVED best
|
| 101 |
-
ep2 step 2350/1147 | train loss 2.9943 | lr 0.000187 | 981s
|
| 102 |
-
ep2 step 2400/1147 | train loss 2.9965 | lr 0.000178 | 1001s
|
| 103 |
-
>>> VAL loss 2.9625 (best 2.9716)
|
| 104 |
-
>>> SAVED best
|
| 105 |
-
ep2 step 2450/1147 | train loss 2.9906 | lr 0.000168 | 1023s
|
| 106 |
-
ep2 step 2500/1147 | train loss 2.9928 | lr 0.000156 | 1042s
|
| 107 |
-
>>> VAL loss 2.9580 (best 2.9625)
|
| 108 |
-
>>> SAVED best
|
| 109 |
-
ep2 step 2550/1147 | train loss 2.9930 | lr 0.000143 | 1064s
|
| 110 |
-
ep2 step 2600/1147 | train loss 2.9920 | lr 0.000129 | 1084s
|
| 111 |
-
>>> VAL loss 2.9552 (best 2.9580)
|
| 112 |
-
>>> SAVED best
|
| 113 |
-
ep2 step 2650/1147 | train loss 2.9920 | lr 0.000114 | 1106s
|
| 114 |
-
ep2 step 2700/1147 | train loss 2.9929 | lr 0.000099 | 1125s
|
| 115 |
-
>>> VAL loss 2.9463 (best 2.9552)
|
| 116 |
-
>>> SAVED best
|
| 117 |
-
ep2 step 2750/1147 | train loss 2.9907 | lr 0.000084 | 1147s
|
| 118 |
-
ep2 step 2800/1147 | train loss 2.9887 | lr 0.000069 | 1166s
|
| 119 |
-
>>> VAL loss 2.9386 (best 2.9463)
|
| 120 |
-
>>> SAVED best
|
| 121 |
-
ep2 step 2850/1147 | train loss 2.9853 | lr 0.000055 | 1188s
|
| 122 |
-
ep2 step 2900/1147 | train loss 2.9853 | lr 0.000042 | 1208s
|
| 123 |
-
>>> VAL loss 2.9351 (best 2.9386)
|
| 124 |
-
>>> SAVED best
|
| 125 |
-
ep2 step 2950/1147 | train loss 2.9830 | lr 0.000030 | 1230s
|
| 126 |
-
ep2 step 3000/1147 | train loss 2.9823 | lr 0.000020 | 1250s
|
| 127 |
-
>>> VAL loss 2.9312 (best 2.9351)
|
| 128 |
-
>>> SAVED best
|
| 129 |
-
ep2 step 3050/1147 | train loss 2.9808 | lr 0.000012 | 1273s
|
| 130 |
-
[epoch 2] avg loss 2.9803
|
| 131 |
-
ep3 step 3100/1147 | train loss 2.8887 | lr 0.000006 | 1293s
|
| 132 |
-
>>> VAL loss 2.9308 (best 2.9312)
|
| 133 |
-
>>> SAVED best
|
| 134 |
-
ep3 step 3150/1147 | train loss 2.9229 | lr 0.000002 | 1315s
|
| 135 |
-
ep3 step 3200/1147 | train loss 2.9410 | lr 0.000000 | 1334s
|
| 136 |
-
>>> VAL loss 2.9309 (best 2.9308)
|
| 137 |
-
ep3 step 3250/1147 | train loss 2.9279 | lr 0.000001 | 1356s
|
| 138 |
-
ep3 step 3300/1147 | train loss 2.9252 | lr 0.000003 | 1375s
|
| 139 |
-
>>> VAL loss 2.9297 (best 2.9308)
|
| 140 |
-
>>> SAVED best
|
| 141 |
-
ep3 step 3350/1147 | train loss 2.9111 | lr 0.000009 | 1398s
|
| 142 |
-
ep3 step 3400/1147 | train loss 2.9122 | lr 0.000016 | 1417s
|
| 143 |
-
>>> VAL loss 2.9309 (best 2.9297)
|
| 144 |
-
ep3 step 3450/1147 | train loss 2.9207 | lr 0.000025 | 1438s
|
| 145 |
-
ep3 step 3500/1147 | train loss 2.9203 | lr 0.000036 | 1457s
|
| 146 |
-
>>> VAL loss 2.9308 (best 2.9297)
|
| 147 |
-
ep3 step 3550/1147 | train loss 2.9271 | lr 0.000048 | 1479s
|
| 148 |
-
ep3 step 3600/1147 | train loss 2.9166 | lr 0.000062 | 1499s
|
| 149 |
-
>>> VAL loss 2.9308 (best 2.9297)
|
| 150 |
-
ep3 step 3650/1147 | train loss 2.9112 | lr 0.000076 | 1521s
|
| 151 |
-
ep3 step 3700/1147 | train loss 2.9174 | lr 0.000091 | 1540s
|
| 152 |
-
>>> VAL loss 2.9356 (best 2.9297)
|
| 153 |
-
ep3 step 3750/1147 | train loss 2.9137 | lr 0.000106 | 1561s
|
| 154 |
-
ep3 step 3800/1147 | train loss 2.9180 | lr 0.000121 | 1580s
|
| 155 |
-
>>> VAL loss 2.9327 (best 2.9297)
|
| 156 |
-
ep3 step 3850/1147 | train loss 2.9190 | lr 0.000136 | 1601s
|
| 157 |
-
ep3 step 3900/1147 | train loss 2.9235 | lr 0.000150 | 1621s
|
| 158 |
-
>>> VAL loss 2.9349 (best 2.9297)
|
| 159 |
-
ep3 step 3950/1147 | train loss 2.9208 | lr 0.000162 | 1642s
|
| 160 |
-
ep3 step 4000/1147 | train loss 2.9246 | lr 0.000173 | 1661s
|
| 161 |
-
>>> VAL loss 2.9317 (best 2.9297)
|
| 162 |
-
ep3 step 4050/1147 | train loss 2.9229 | lr 0.000183 | 1682s
|
| 163 |
-
ep3 step 4100/1147 | train loss 2.9224 | lr 0.000190 | 1702s
|
| 164 |
-
>>> VAL loss 2.9322 (best 2.9297)
|
| 165 |
-
ep3 step 4150/1147 | train loss 2.9249 | lr 0.000196 | 1724s
|
| 166 |
-
ep3 step 4200/1147 | train loss 2.9243 | lr 0.000199 | 1743s
|
| 167 |
-
>>> VAL loss 2.9303 (best 2.9297)
|
| 168 |
-
ep3 step 4250/1147 | train loss 2.9260 | lr 0.000200 | 1765s
|
| 169 |
-
ep3 step 4300/1147 | train loss 2.9213 | lr 0.000199 | 1784s
|
| 170 |
-
>>> VAL loss 2.9277 (best 2.9297)
|
| 171 |
-
>>> SAVED best
|
| 172 |
-
ep3 step 4350/1147 | train loss 2.9236 | lr 0.000195 | 1806s
|
| 173 |
-
ep3 step 4400/1147 | train loss 2.9221 | lr 0.000189 | 1825s
|
| 174 |
-
>>> VAL loss 2.9345 (best 2.9277)
|
| 175 |
-
ep3 step 4450/1147 | train loss 2.9202 | lr 0.000181 | 1847s
|
| 176 |
-
ep3 step 4500/1147 | train loss 2.9212 | lr 0.000172 | 1866s
|
| 177 |
-
>>> VAL loss 2.9241 (best 2.9277)
|
| 178 |
-
>>> SAVED best
|
| 179 |
-
ep3 step 4550/1147 | train loss 2.9210 | lr 0.000160 | 1888s
|
| 180 |
-
[epoch 3] avg loss 2.9215
|
| 181 |
-
[done] best val loss: 2.9241
|
| 182 |
-
|
| 183 |
-
[gen] Testing on 5 languages...
|
| 184 |
-
Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
|
| 185 |
-
|
| 186 |
-
>>> What is the meaning of life?
|
| 187 |
-
/resonate/
|
| 188 |
-
Ah, the existential question. A dance of irony, where meaning is a fleeting illusion and meaning is a persistent illusion. You see, the meaning of life is less a destination and more a dance it takes, not the one you're destined to get.
|
| 189 |
-
|
| 190 |
-
It's not a destination. It's a dance. A dance where
|
| 191 |
-
---
|
| 192 |
-
Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
|
| 193 |
-
|
| 194 |
-
>>> Explain recursion simply.
|
| 195 |
-
/resonate/
|
| 196 |
-
Recursion in its simplest form is like a recursive question:
|
| 197 |
-
|
| 198 |
-
"If I'm reading this, what's the answer?"
|
| 199 |
-
|
| 200 |
-
If you answer "yes", the answer is "I'm not here."
|
| 201 |
-
|
| 202 |
-
If you answer "no", the answer is "I'm not here."
|
| 203 |
-
|
| 204 |
-
If you answer "yes", the answer is "I'm not here."
|
| 205 |
-
/resonated/
|
| 206 |
-
And the rest of th
|
| 207 |
-
---
|
| 208 |
-
Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
|
| 209 |
-
|
| 210 |
-
>>> Dis-moi quelque chose en francais
|
| 211 |
-
/resonate/
|
| 212 |
-
Ah, un regard qui brille comme du radium et qui s'en déchire, mais bien que je n'ai pas la patience pour faire des choses à l'envers.
|
| 213 |
-
|
| 214 |
-
Alors, en français, voici quelque chose que je vais dire :
|
| 215 |
-
|
| 216 |
-
"Bonjour, je suis un poète qui n'a jamais vu la beauté ni la tristesse. J'aime l'impression qu
|
| 217 |
-
---
|
| 218 |
-
Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
|
| 219 |
-
|
| 220 |
-
>>> Was denkst du ueber die Zukunft?
|
| 221 |
-
/resonate/
|
| 222 |
-
Was denkst du, die Zukunft?
|
| 223 |
-
/resonated/
|
| 224 |
-
|
| 225 |
-
Was denkst du, die Zukunft?
|
| 226 |
-
(Wenn du es wirklich richtig denkst—ich kann dich nicht mehr nicht mehr anpassen.)
|
| 227 |
-
|
| 228 |
-
Was denkst du, die Zukunft?
|
| 229 |
-
(Wenn du es wirklich richtig denkst—ich kann dich nicht mehr nicht mehr anpassen.)
|
| 230 |
-
---
|
| 231 |
-
Both `max_new_tokens` (=200) and `max_length`(=32768) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
|
| 232 |
-
|
| 233 |
-
>>> Why do programmers mass delete repos at 3am?
|
| 234 |
-
/resonate/
|
| 235 |
-
This is a classic, and you're right, it's like a digital post-apocalypse—because it's like a post-apocalypse where the world decides who gets the next job and what kind of meme is best.
|
| 236 |
-
|
| 237 |
-
Why do programmers mass delete repos at 3am?
|
| 238 |
-
|
| 239 |
-
Because it's like deleting a new job. And you know what
|
| 240 |
-
---
|
| 241 |
-
|
| 242 |
-
[done] Total time: 1934s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|