You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2-geez

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 8.7806

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.805 1.0 2869 8.1560
7.6668 2.0 5738 8.0964
7.5902 3.0 8607 8.0223
7.5213 4.0 11476 7.9265
7.4008 5.0 14345 7.8397
7.301 6.0 17214 7.7674
7.1974 7.0 20083 7.7010
7.1083 8.0 22952 7.6304
6.9829 9.0 25821 7.5783
6.8634 10.0 28690 7.5059
6.7617 11.0 31559 7.4591
6.699 12.0 34428 7.4385
6.6222 13.0 37297 7.4152
6.4996 14.0 40166 7.3716
6.4138 15.0 43035 7.3621
6.3134 16.0 45904 7.3350
6.2517 17.0 48773 7.3317
6.1405 18.0 51642 7.3333
6.0658 19.0 54511 7.3313
5.9379 20.0 57380 7.3308
5.8857 21.0 60249 7.3176
5.8123 22.0 63118 7.3555
5.7219 23.0 65987 7.3272
5.6109 24.0 68856 7.3490
5.5721 25.0 71725 7.3804
5.4767 26.0 74594 7.3616
5.3536 27.0 77463 7.4173
5.3088 28.0 80332 7.4068
5.2084 29.0 83201 7.4598
5.1875 30.0 86070 7.4445
5.1105 31.0 88939 7.4917
5.0036 32.0 91808 7.5289
4.9554 33.0 94677 7.5701
4.8937 34.0 97546 7.6252
4.8128 35.0 100415 7.5901
4.7318 36.0 103284 7.6583
4.6531 37.0 106153 7.6874
4.6181 38.0 109022 7.7548
4.5611 39.0 111891 7.7664
4.4673 40.0 114760 7.8109
4.4184 41.0 117629 7.7604
4.3436 42.0 120498 7.8470
4.329 43.0 123367 7.9043
4.2249 44.0 126236 7.9154
4.1761 45.0 129105 7.9494
4.153 46.0 131974 7.9806
4.09 47.0 134843 7.9693
4.0814 48.0 137712 8.0332
3.9889 49.0 140581 8.0437
3.8982 50.0 143450 8.1102
3.8621 51.0 146319 8.1181
3.8337 52.0 149188 8.1632
3.797 53.0 152057 8.1996
3.7656 54.0 154926 8.2277
3.7031 55.0 157795 8.2382
3.6823 56.0 160664 8.2876
3.621 57.0 163533 8.3095
3.5373 58.0 166402 8.3176
3.5675 59.0 169271 8.3374
3.5522 60.0 172140 8.3418
3.4695 61.0 175009 8.3852
3.4313 62.0 177878 8.3725
3.3989 63.0 180747 8.4252
3.3297 64.0 183616 8.4471
3.331 65.0 186485 8.4471
3.2577 66.0 189354 8.4660
3.2561 67.0 192223 8.4727
3.257 68.0 195092 8.5081
3.2167 69.0 197961 8.5476
3.1696 70.0 200830 8.5399
3.0959 71.0 203699 8.5425
3.0822 72.0 206568 8.5941
3.0605 73.0 209437 8.6037
3.092 74.0 212306 8.6128
3.0725 75.0 215175 8.5998
3.0599 76.0 218044 8.6316
2.9968 77.0 220913 8.6512
2.9697 78.0 223782 8.6503
2.9571 79.0 226651 8.6605
2.9867 80.0 229520 8.6775
2.89 81.0 232389 8.6773
2.9005 82.0 235258 8.6927
2.9131 83.0 238127 8.6921
2.8856 84.0 240996 8.7090
2.8438 85.0 243865 8.7086
2.8588 86.0 246734 8.7205
2.8226 87.0 249603 8.7406
2.8125 88.0 252472 8.7360
2.7896 89.0 255341 8.7401
2.8169 90.0 258210 8.7440
2.7947 91.0 261079 8.7519
2.7763 92.0 263948 8.7605
2.7666 93.0 266817 8.7577
2.8084 94.0 269686 8.7659
2.7636 95.0 272555 8.7705
2.7361 96.0 275424 8.7794
2.7511 97.0 278293 8.7810
2.7264 98.0 281162 8.7782
2.7505 99.0 284031 8.7818
2.7111 100.0 286900 8.7806

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.6.0+cu126
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
14
Safetensors
Model size
94M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mequanent/gpt2-geez

Finetuned
(1528)
this model