lig_model_1
This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.7688
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 15
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
6.8679 | 0.2310 | 8 | 6.6211 |
6.3894 | 0.4621 | 16 | 6.3666 |
6.2641 | 0.6931 | 24 | 6.2481 |
6.1285 | 0.9242 | 32 | 6.0829 |
5.9436 | 1.1552 | 40 | 5.8900 |
5.8073 | 1.3863 | 48 | 5.7490 |
5.7164 | 1.6173 | 56 | 5.6617 |
5.6019 | 1.8484 | 64 | 5.5778 |
5.5427 | 2.0794 | 72 | 5.4886 |
5.454 | 2.3105 | 80 | 5.3954 |
5.3546 | 2.5415 | 88 | 5.3066 |
5.3014 | 2.7726 | 96 | 5.2124 |
5.2448 | 3.0036 | 104 | 5.1365 |
5.1185 | 3.2347 | 112 | 5.0765 |
5.0938 | 3.4657 | 120 | 5.0071 |
5.0347 | 3.6968 | 128 | 4.9339 |
4.9681 | 3.9278 | 136 | 4.8552 |
4.8323 | 4.1588 | 144 | 4.7821 |
4.7912 | 4.3899 | 152 | 4.7215 |
4.7225 | 4.6209 | 160 | 4.6431 |
4.6433 | 4.8520 | 168 | 4.5701 |
4.5309 | 5.0830 | 176 | 4.5002 |
4.4506 | 5.3141 | 184 | 4.4442 |
4.4097 | 5.5451 | 192 | 4.3820 |
4.3871 | 5.7762 | 200 | 4.3290 |
4.3345 | 6.0072 | 208 | 4.2869 |
4.2004 | 6.2383 | 216 | 4.2412 |
4.1716 | 6.4693 | 224 | 4.1978 |
4.1536 | 6.7004 | 232 | 4.1607 |
4.0975 | 6.9314 | 240 | 4.1294 |
3.9743 | 7.1625 | 248 | 4.1014 |
3.922 | 7.3935 | 256 | 4.0654 |
3.939 | 7.6245 | 264 | 4.0378 |
3.9208 | 7.8556 | 272 | 4.0102 |
3.8083 | 8.0866 | 280 | 3.9812 |
3.7611 | 8.3177 | 288 | 3.9630 |
3.7668 | 8.5487 | 296 | 3.9407 |
3.7285 | 8.7798 | 304 | 3.9183 |
3.6996 | 9.0108 | 312 | 3.8958 |
3.5754 | 9.2419 | 320 | 3.8825 |
3.5708 | 9.4729 | 328 | 3.8702 |
3.5607 | 9.7040 | 336 | 3.8510 |
3.5688 | 9.9350 | 344 | 3.8387 |
3.4188 | 10.1661 | 352 | 3.8350 |
3.432 | 10.3971 | 360 | 3.8261 |
3.4236 | 10.6282 | 368 | 3.8131 |
3.3985 | 10.8592 | 376 | 3.8026 |
3.306 | 11.0903 | 384 | 3.7934 |
3.3196 | 11.3213 | 392 | 3.7919 |
3.3031 | 11.5523 | 400 | 3.7908 |
3.2851 | 11.7834 | 408 | 3.7817 |
3.2703 | 12.0144 | 416 | 3.7789 |
3.2132 | 12.2455 | 424 | 3.7818 |
3.1829 | 12.4765 | 432 | 3.7778 |
3.1968 | 12.7076 | 440 | 3.7749 |
3.2206 | 12.9386 | 448 | 3.7711 |
3.1521 | 13.1697 | 456 | 3.7694 |
3.1412 | 13.4007 | 464 | 3.7700 |
3.1415 | 13.6318 | 472 | 3.7709 |
3.1402 | 13.8628 | 480 | 3.7694 |
3.129 | 14.0939 | 488 | 3.7689 |
3.1221 | 14.3249 | 496 | 3.7687 |
3.1576 | 14.5560 | 504 | 3.7688 |
Framework versions
- Transformers 4.41.0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 3