gpt2-geez
This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 8.7806
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 100
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
7.805 | 1.0 | 2869 | 8.1560 |
7.6668 | 2.0 | 5738 | 8.0964 |
7.5902 | 3.0 | 8607 | 8.0223 |
7.5213 | 4.0 | 11476 | 7.9265 |
7.4008 | 5.0 | 14345 | 7.8397 |
7.301 | 6.0 | 17214 | 7.7674 |
7.1974 | 7.0 | 20083 | 7.7010 |
7.1083 | 8.0 | 22952 | 7.6304 |
6.9829 | 9.0 | 25821 | 7.5783 |
6.8634 | 10.0 | 28690 | 7.5059 |
6.7617 | 11.0 | 31559 | 7.4591 |
6.699 | 12.0 | 34428 | 7.4385 |
6.6222 | 13.0 | 37297 | 7.4152 |
6.4996 | 14.0 | 40166 | 7.3716 |
6.4138 | 15.0 | 43035 | 7.3621 |
6.3134 | 16.0 | 45904 | 7.3350 |
6.2517 | 17.0 | 48773 | 7.3317 |
6.1405 | 18.0 | 51642 | 7.3333 |
6.0658 | 19.0 | 54511 | 7.3313 |
5.9379 | 20.0 | 57380 | 7.3308 |
5.8857 | 21.0 | 60249 | 7.3176 |
5.8123 | 22.0 | 63118 | 7.3555 |
5.7219 | 23.0 | 65987 | 7.3272 |
5.6109 | 24.0 | 68856 | 7.3490 |
5.5721 | 25.0 | 71725 | 7.3804 |
5.4767 | 26.0 | 74594 | 7.3616 |
5.3536 | 27.0 | 77463 | 7.4173 |
5.3088 | 28.0 | 80332 | 7.4068 |
5.2084 | 29.0 | 83201 | 7.4598 |
5.1875 | 30.0 | 86070 | 7.4445 |
5.1105 | 31.0 | 88939 | 7.4917 |
5.0036 | 32.0 | 91808 | 7.5289 |
4.9554 | 33.0 | 94677 | 7.5701 |
4.8937 | 34.0 | 97546 | 7.6252 |
4.8128 | 35.0 | 100415 | 7.5901 |
4.7318 | 36.0 | 103284 | 7.6583 |
4.6531 | 37.0 | 106153 | 7.6874 |
4.6181 | 38.0 | 109022 | 7.7548 |
4.5611 | 39.0 | 111891 | 7.7664 |
4.4673 | 40.0 | 114760 | 7.8109 |
4.4184 | 41.0 | 117629 | 7.7604 |
4.3436 | 42.0 | 120498 | 7.8470 |
4.329 | 43.0 | 123367 | 7.9043 |
4.2249 | 44.0 | 126236 | 7.9154 |
4.1761 | 45.0 | 129105 | 7.9494 |
4.153 | 46.0 | 131974 | 7.9806 |
4.09 | 47.0 | 134843 | 7.9693 |
4.0814 | 48.0 | 137712 | 8.0332 |
3.9889 | 49.0 | 140581 | 8.0437 |
3.8982 | 50.0 | 143450 | 8.1102 |
3.8621 | 51.0 | 146319 | 8.1181 |
3.8337 | 52.0 | 149188 | 8.1632 |
3.797 | 53.0 | 152057 | 8.1996 |
3.7656 | 54.0 | 154926 | 8.2277 |
3.7031 | 55.0 | 157795 | 8.2382 |
3.6823 | 56.0 | 160664 | 8.2876 |
3.621 | 57.0 | 163533 | 8.3095 |
3.5373 | 58.0 | 166402 | 8.3176 |
3.5675 | 59.0 | 169271 | 8.3374 |
3.5522 | 60.0 | 172140 | 8.3418 |
3.4695 | 61.0 | 175009 | 8.3852 |
3.4313 | 62.0 | 177878 | 8.3725 |
3.3989 | 63.0 | 180747 | 8.4252 |
3.3297 | 64.0 | 183616 | 8.4471 |
3.331 | 65.0 | 186485 | 8.4471 |
3.2577 | 66.0 | 189354 | 8.4660 |
3.2561 | 67.0 | 192223 | 8.4727 |
3.257 | 68.0 | 195092 | 8.5081 |
3.2167 | 69.0 | 197961 | 8.5476 |
3.1696 | 70.0 | 200830 | 8.5399 |
3.0959 | 71.0 | 203699 | 8.5425 |
3.0822 | 72.0 | 206568 | 8.5941 |
3.0605 | 73.0 | 209437 | 8.6037 |
3.092 | 74.0 | 212306 | 8.6128 |
3.0725 | 75.0 | 215175 | 8.5998 |
3.0599 | 76.0 | 218044 | 8.6316 |
2.9968 | 77.0 | 220913 | 8.6512 |
2.9697 | 78.0 | 223782 | 8.6503 |
2.9571 | 79.0 | 226651 | 8.6605 |
2.9867 | 80.0 | 229520 | 8.6775 |
2.89 | 81.0 | 232389 | 8.6773 |
2.9005 | 82.0 | 235258 | 8.6927 |
2.9131 | 83.0 | 238127 | 8.6921 |
2.8856 | 84.0 | 240996 | 8.7090 |
2.8438 | 85.0 | 243865 | 8.7086 |
2.8588 | 86.0 | 246734 | 8.7205 |
2.8226 | 87.0 | 249603 | 8.7406 |
2.8125 | 88.0 | 252472 | 8.7360 |
2.7896 | 89.0 | 255341 | 8.7401 |
2.8169 | 90.0 | 258210 | 8.7440 |
2.7947 | 91.0 | 261079 | 8.7519 |
2.7763 | 92.0 | 263948 | 8.7605 |
2.7666 | 93.0 | 266817 | 8.7577 |
2.8084 | 94.0 | 269686 | 8.7659 |
2.7636 | 95.0 | 272555 | 8.7705 |
2.7361 | 96.0 | 275424 | 8.7794 |
2.7511 | 97.0 | 278293 | 8.7810 |
2.7264 | 98.0 | 281162 | 8.7782 |
2.7505 | 99.0 | 284031 | 8.7818 |
2.7111 | 100.0 | 286900 | 8.7806 |
Framework versions
- Transformers 4.48.3
- Pytorch 2.6.0+cu126
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Mequanent/gpt2-geez
Base model
openai-community/gpt2