Edit model card

gpt-neo-125m-finetuned-philosopher_rave_100

This model is a fine-tuned version of EleutherAI/gpt-neo-125m on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3681

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100.0

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 155 2.6967
No log 2.0 310 2.6846
No log 3.0 465 2.6733
2.6891 4.0 620 2.6626
2.6891 5.0 775 2.6524
2.6891 6.0 930 2.6427
2.6569 7.0 1085 2.6336
2.6569 8.0 1240 2.6248
2.6569 9.0 1395 2.6164
2.6215 10.0 1550 2.6083
2.6215 11.0 1705 2.6005
2.6215 12.0 1860 2.5931
2.6022 13.0 2015 2.5858
2.6022 14.0 2170 2.5789
2.6022 15.0 2325 2.5721
2.6022 16.0 2480 2.5657
2.5777 17.0 2635 2.5594
2.5777 18.0 2790 2.5532
2.5777 19.0 2945 2.5473
2.5548 20.0 3100 2.5416
2.5548 21.0 3255 2.5360
2.5548 22.0 3410 2.5306
2.5359 23.0 3565 2.5253
2.5359 24.0 3720 2.5202
2.5359 25.0 3875 2.5152
2.5248 26.0 4030 2.5103
2.5248 27.0 4185 2.5056
2.5248 28.0 4340 2.5011
2.5248 29.0 4495 2.4966
2.5053 30.0 4650 2.4922
2.5053 31.0 4805 2.4880
2.5053 32.0 4960 2.4839
2.4871 33.0 5115 2.4798
2.4871 34.0 5270 2.4759
2.4871 35.0 5425 2.4721
2.4808 36.0 5580 2.4683
2.4808 37.0 5735 2.4647
2.4808 38.0 5890 2.4612
2.4659 39.0 6045 2.4577
2.4659 40.0 6200 2.4544
2.4659 41.0 6355 2.4511
2.4517 42.0 6510 2.4479
2.4517 43.0 6665 2.4447
2.4517 44.0 6820 2.4417
2.4517 45.0 6975 2.4387
2.4466 46.0 7130 2.4359
2.4466 47.0 7285 2.4330
2.4466 48.0 7440 2.4303
2.4348 49.0 7595 2.4276
2.4348 50.0 7750 2.4250
2.4348 51.0 7905 2.4225
2.4238 52.0 8060 2.4201
2.4238 53.0 8215 2.4177
2.4238 54.0 8370 2.4154
2.4172 55.0 8525 2.4131
2.4172 56.0 8680 2.4109
2.4172 57.0 8835 2.4088
2.4172 58.0 8990 2.4067
2.4097 59.0 9145 2.4047
2.4097 60.0 9300 2.4027
2.4097 61.0 9455 2.4008
2.4054 62.0 9610 2.3990
2.4054 63.0 9765 2.3972
2.4054 64.0 9920 2.3955
2.3936 65.0 10075 2.3938
2.3936 66.0 10230 2.3922
2.3936 67.0 10385 2.3906
2.394 68.0 10540 2.3891
2.394 69.0 10695 2.3877
2.394 70.0 10850 2.3863
2.387 71.0 11005 2.3850
2.387 72.0 11160 2.3837
2.387 73.0 11315 2.3824
2.387 74.0 11470 2.3813
2.3812 75.0 11625 2.3801
2.3812 76.0 11780 2.3791
2.3812 77.0 11935 2.3780
2.3812 78.0 12090 2.3771
2.3812 79.0 12245 2.3762
2.3812 80.0 12400 2.3753
2.3802 81.0 12555 2.3745
2.3802 82.0 12710 2.3737
2.3802 83.0 12865 2.3730
2.3687 84.0 13020 2.3723
2.3687 85.0 13175 2.3717
2.3687 86.0 13330 2.3711
2.3687 87.0 13485 2.3706
2.3722 88.0 13640 2.3702
2.3722 89.0 13795 2.3698
2.3722 90.0 13950 2.3694
2.3693 91.0 14105 2.3691
2.3693 92.0 14260 2.3688
2.3693 93.0 14415 2.3686
2.3654 94.0 14570 2.3684
2.3654 95.0 14725 2.3683
2.3654 96.0 14880 2.3682
2.372 97.0 15035 2.3682
2.372 98.0 15190 2.3681
2.372 99.0 15345 2.3681
2.3664 100.0 15500 2.3681

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
28
Safetensors
Model size
125M params
Tensor type
F32
·

Finetuned from