Edit model card

polish_wikipedia_model

This model is a fine-tuned version of EleutherAI/pythia-70m on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0137

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 300

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 9 1.1594
No log 2.0 18 1.1355
No log 3.0 27 1.0811
No log 4.0 36 1.0362
No log 5.0 45 1.0053
No log 6.0 54 0.9728
No log 7.0 63 0.9478
No log 8.0 72 0.9089
No log 9.0 81 0.8784
No log 10.0 90 0.8576
No log 11.0 99 0.8391
No log 12.0 108 0.8121
No log 13.0 117 0.7778
No log 14.0 126 0.7642
No log 15.0 135 0.7411
No log 16.0 144 0.7252
No log 17.0 153 0.7541
No log 18.0 162 0.6939
No log 19.0 171 0.6616
No log 20.0 180 0.6834
No log 21.0 189 0.6032
No log 22.0 198 0.5909
No log 23.0 207 0.5899
No log 24.0 216 0.5610
No log 25.0 225 0.5404
No log 26.0 234 0.5576
No log 27.0 243 0.5253
No log 28.0 252 0.5085
No log 29.0 261 0.5035
No log 30.0 270 0.5017
No log 31.0 279 0.4817
No log 32.0 288 0.4690
No log 33.0 297 0.4569
No log 34.0 306 0.4611
No log 35.0 315 0.4389
No log 36.0 324 0.4598
No log 37.0 333 0.4308
No log 38.0 342 0.4101
No log 39.0 351 0.4056
No log 40.0 360 0.3939
No log 41.0 369 0.3801
No log 42.0 378 0.3741
No log 43.0 387 0.3739
No log 44.0 396 0.3779
No log 45.0 405 0.3633
No log 46.0 414 0.3614
No log 47.0 423 0.3497
No log 48.0 432 0.3508
No log 49.0 441 0.3425
No log 50.0 450 0.3399
No log 51.0 459 0.3357
No log 52.0 468 0.3393
No log 53.0 477 0.3241
No log 54.0 486 0.3427
No log 55.0 495 0.3452
0.614 56.0 504 0.3283
0.614 57.0 513 0.3182
0.614 58.0 522 0.3192
0.614 59.0 531 0.3118
0.614 60.0 540 0.3055
0.614 61.0 549 0.3109
0.614 62.0 558 0.2976
0.614 63.0 567 0.3052
0.614 64.0 576 0.2988
0.614 65.0 585 0.3035
0.614 66.0 594 0.2874
0.614 67.0 603 0.2812
0.614 68.0 612 0.2828
0.614 69.0 621 0.2786
0.614 70.0 630 0.2775
0.614 71.0 639 0.2828
0.614 72.0 648 0.2710
0.614 73.0 657 0.2725
0.614 74.0 666 0.2930
0.614 75.0 675 0.2642
0.614 76.0 684 0.2661
0.614 77.0 693 0.2493
0.614 78.0 702 0.2494
0.614 79.0 711 0.2370
0.614 80.0 720 0.2497
0.614 81.0 729 0.2399
0.614 82.0 738 0.2340
0.614 83.0 747 0.2248
0.614 84.0 756 0.2234
0.614 85.0 765 0.2284
0.614 86.0 774 0.2099
0.614 87.0 783 0.2081
0.614 88.0 792 0.1958
0.614 89.0 801 0.1969
0.614 90.0 810 0.1843
0.614 91.0 819 0.1746
0.614 92.0 828 0.1718
0.614 93.0 837 0.1665
0.614 94.0 846 0.1597
0.614 95.0 855 0.1633
0.614 96.0 864 0.1490
0.614 97.0 873 0.1414
0.614 98.0 882 0.1344
0.614 99.0 891 0.1446
0.614 100.0 900 0.1426
0.614 101.0 909 0.1364
0.614 102.0 918 0.1310
0.614 103.0 927 0.1342
0.614 104.0 936 0.1312
0.614 105.0 945 0.1178
0.614 106.0 954 0.1040
0.614 107.0 963 0.0998
0.614 108.0 972 0.1120
0.614 109.0 981 0.1798
0.614 110.0 990 0.1072
0.614 111.0 999 0.0864
0.2254 112.0 1008 0.0876
0.2254 113.0 1017 0.0805
0.2254 114.0 1026 0.0684
0.2254 115.0 1035 0.0826
0.2254 116.0 1044 0.0772
0.2254 117.0 1053 0.0667
0.2254 118.0 1062 0.0616
0.2254 119.0 1071 0.0641
0.2254 120.0 1080 0.0528
0.2254 121.0 1089 0.0520
0.2254 122.0 1098 0.0454
0.2254 123.0 1107 0.0407
0.2254 124.0 1116 0.0440
0.2254 125.0 1125 0.0449
0.2254 126.0 1134 0.0423
0.2254 127.0 1143 0.0503
0.2254 128.0 1152 0.0380
0.2254 129.0 1161 0.0440
0.2254 130.0 1170 0.0435
0.2254 131.0 1179 0.0718
0.2254 132.0 1188 0.0483
0.2254 133.0 1197 0.0474
0.2254 134.0 1206 0.0424
0.2254 135.0 1215 0.0387
0.2254 136.0 1224 0.0357
0.2254 137.0 1233 0.0354
0.2254 138.0 1242 0.0340
0.2254 139.0 1251 0.0364
0.2254 140.0 1260 0.0375
0.2254 141.0 1269 0.0345
0.2254 142.0 1278 0.0434
0.2254 143.0 1287 0.0310
0.2254 144.0 1296 0.0291
0.2254 145.0 1305 0.0272
0.2254 146.0 1314 0.0250
0.2254 147.0 1323 0.0262
0.2254 148.0 1332 0.0244
0.2254 149.0 1341 0.0275
0.2254 150.0 1350 0.0273
0.2254 151.0 1359 0.0294
0.2254 152.0 1368 0.0305
0.2254 153.0 1377 0.0301
0.2254 154.0 1386 0.0277
0.2254 155.0 1395 0.0335
0.2254 156.0 1404 0.0430
0.2254 157.0 1413 0.0217
0.2254 158.0 1422 0.0244
0.2254 159.0 1431 0.0260
0.2254 160.0 1440 0.0249
0.2254 161.0 1449 0.0224
0.2254 162.0 1458 0.0237
0.2254 163.0 1467 0.0228
0.2254 164.0 1476 0.0198
0.2254 165.0 1485 0.0315
0.2254 166.0 1494 0.0283
0.046 167.0 1503 0.0245
0.046 168.0 1512 0.0201
0.046 169.0 1521 0.0272
0.046 170.0 1530 0.0191
0.046 171.0 1539 0.0281
0.046 172.0 1548 0.0236
0.046 173.0 1557 0.0207
0.046 174.0 1566 0.0183
0.046 175.0 1575 0.0285
0.046 176.0 1584 0.0232
0.046 177.0 1593 0.0185
0.046 178.0 1602 0.0193
0.046 179.0 1611 0.0188
0.046 180.0 1620 0.0189
0.046 181.0 1629 0.0224
0.046 182.0 1638 0.0228
0.046 183.0 1647 0.0239
0.046 184.0 1656 0.0219
0.046 185.0 1665 0.0175
0.046 186.0 1674 0.0216
0.046 187.0 1683 0.0225
0.046 188.0 1692 0.0193
0.046 189.0 1701 0.0171
0.046 190.0 1710 0.0184
0.046 191.0 1719 0.0184
0.046 192.0 1728 0.0174
0.046 193.0 1737 0.0178
0.046 194.0 1746 0.0184
0.046 195.0 1755 0.0191
0.046 196.0 1764 0.0256
0.046 197.0 1773 0.0183
0.046 198.0 1782 0.0178
0.046 199.0 1791 0.0181
0.046 200.0 1800 0.0203
0.046 201.0 1809 0.0196
0.046 202.0 1818 0.0181
0.046 203.0 1827 0.0197
0.046 204.0 1836 0.0183
0.046 205.0 1845 0.0174
0.046 206.0 1854 0.0154
0.046 207.0 1863 0.0169
0.046 208.0 1872 0.0166
0.046 209.0 1881 0.0220
0.046 210.0 1890 0.0204
0.046 211.0 1899 0.0189
0.046 212.0 1908 0.0167
0.046 213.0 1917 0.0183
0.046 214.0 1926 0.0173
0.046 215.0 1935 0.0163
0.046 216.0 1944 0.0164
0.046 217.0 1953 0.0182
0.046 218.0 1962 0.0177
0.046 219.0 1971 0.0164
0.046 220.0 1980 0.0171
0.046 221.0 1989 0.0163
0.046 222.0 1998 0.0184
0.0226 223.0 2007 0.0180
0.0226 224.0 2016 0.0198
0.0226 225.0 2025 0.0181
0.0226 226.0 2034 0.0164
0.0226 227.0 2043 0.0157
0.0226 228.0 2052 0.0159
0.0226 229.0 2061 0.0156
0.0226 230.0 2070 0.0166
0.0226 231.0 2079 0.0154
0.0226 232.0 2088 0.0174
0.0226 233.0 2097 0.0157
0.0226 234.0 2106 0.0162
0.0226 235.0 2115 0.0162
0.0226 236.0 2124 0.0162
0.0226 237.0 2133 0.0222
0.0226 238.0 2142 0.0189
0.0226 239.0 2151 0.0182
0.0226 240.0 2160 0.0151
0.0226 241.0 2169 0.0152
0.0226 242.0 2178 0.0152
0.0226 243.0 2187 0.0154
0.0226 244.0 2196 0.0146
0.0226 245.0 2205 0.0145
0.0226 246.0 2214 0.0151
0.0226 247.0 2223 0.0173
0.0226 248.0 2232 0.0161
0.0226 249.0 2241 0.0151
0.0226 250.0 2250 0.0149
0.0226 251.0 2259 0.0156
0.0226 252.0 2268 0.0143
0.0226 253.0 2277 0.0163
0.0226 254.0 2286 0.0156
0.0226 255.0 2295 0.0156
0.0226 256.0 2304 0.0146
0.0226 257.0 2313 0.0149
0.0226 258.0 2322 0.0150
0.0226 259.0 2331 0.0158
0.0226 260.0 2340 0.0142
0.0226 261.0 2349 0.0147
0.0226 262.0 2358 0.0144
0.0226 263.0 2367 0.0145
0.0226 264.0 2376 0.0142
0.0226 265.0 2385 0.0143
0.0226 266.0 2394 0.0140
0.0226 267.0 2403 0.0141
0.0226 268.0 2412 0.0153
0.0226 269.0 2421 0.0141
0.0226 270.0 2430 0.0144
0.0226 271.0 2439 0.0139
0.0226 272.0 2448 0.0141
0.0226 273.0 2457 0.0141
0.0226 274.0 2466 0.0139
0.0226 275.0 2475 0.0141
0.0226 276.0 2484 0.0140
0.0226 277.0 2493 0.0142
0.0165 278.0 2502 0.0146
0.0165 279.0 2511 0.0141
0.0165 280.0 2520 0.0138
0.0165 281.0 2529 0.0138
0.0165 282.0 2538 0.0138
0.0165 283.0 2547 0.0138
0.0165 284.0 2556 0.0138
0.0165 285.0 2565 0.0139
0.0165 286.0 2574 0.0137
0.0165 287.0 2583 0.0137
0.0165 288.0 2592 0.0137
0.0165 289.0 2601 0.0138
0.0165 290.0 2610 0.0137
0.0165 291.0 2619 0.0137
0.0165 292.0 2628 0.0137
0.0165 293.0 2637 0.0137
0.0165 294.0 2646 0.0137
0.0165 295.0 2655 0.0137
0.0165 296.0 2664 0.0137
0.0165 297.0 2673 0.0137
0.0165 298.0 2682 0.0137
0.0165 299.0 2691 0.0137
0.0165 300.0 2700 0.0137

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
22
Safetensors
Model size
70.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Pyro-X2/polish_wikipedia_model

Finetuned
(92)
this model