--- license: mit base_model: gpt2 tags: - generated_from_trainer model-index: - name: lig_model_1 results: [] --- # lig_model_1 This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset. It achieves the following results on the evaluation set: - Loss: 3.7688 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 256 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 15 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-------:|:----:|:---------------:| | 6.8679 | 0.2310 | 8 | 6.6211 | | 6.3894 | 0.4621 | 16 | 6.3666 | | 6.2641 | 0.6931 | 24 | 6.2481 | | 6.1285 | 0.9242 | 32 | 6.0829 | | 5.9436 | 1.1552 | 40 | 5.8900 | | 5.8073 | 1.3863 | 48 | 5.7490 | | 5.7164 | 1.6173 | 56 | 5.6617 | | 5.6019 | 1.8484 | 64 | 5.5778 | | 5.5427 | 2.0794 | 72 | 5.4886 | | 5.454 | 2.3105 | 80 | 5.3954 | | 5.3546 | 2.5415 | 88 | 5.3066 | | 5.3014 | 2.7726 | 96 | 5.2124 | | 5.2448 | 3.0036 | 104 | 5.1365 | | 5.1185 | 3.2347 | 112 | 5.0765 | | 5.0938 | 3.4657 | 120 | 5.0071 | | 5.0347 | 3.6968 | 128 | 4.9339 | | 4.9681 | 3.9278 | 136 | 4.8552 | | 4.8323 | 4.1588 | 144 | 4.7821 | | 4.7912 | 4.3899 | 152 | 4.7215 | | 4.7225 | 4.6209 | 160 | 4.6431 | | 4.6433 | 4.8520 | 168 | 4.5701 | | 4.5309 | 5.0830 | 176 | 4.5002 | | 4.4506 | 5.3141 | 184 | 4.4442 | | 4.4097 | 5.5451 | 192 | 4.3820 | | 4.3871 | 5.7762 | 200 | 4.3290 | | 4.3345 | 6.0072 | 208 | 4.2869 | | 4.2004 | 6.2383 | 216 | 4.2412 | | 4.1716 | 6.4693 | 224 | 4.1978 | | 4.1536 | 6.7004 | 232 | 4.1607 | | 4.0975 | 6.9314 | 240 | 4.1294 | | 3.9743 | 7.1625 | 248 | 4.1014 | | 3.922 | 7.3935 | 256 | 4.0654 | | 3.939 | 7.6245 | 264 | 4.0378 | | 3.9208 | 7.8556 | 272 | 4.0102 | | 3.8083 | 8.0866 | 280 | 3.9812 | | 3.7611 | 8.3177 | 288 | 3.9630 | | 3.7668 | 8.5487 | 296 | 3.9407 | | 3.7285 | 8.7798 | 304 | 3.9183 | | 3.6996 | 9.0108 | 312 | 3.8958 | | 3.5754 | 9.2419 | 320 | 3.8825 | | 3.5708 | 9.4729 | 328 | 3.8702 | | 3.5607 | 9.7040 | 336 | 3.8510 | | 3.5688 | 9.9350 | 344 | 3.8387 | | 3.4188 | 10.1661 | 352 | 3.8350 | | 3.432 | 10.3971 | 360 | 3.8261 | | 3.4236 | 10.6282 | 368 | 3.8131 | | 3.3985 | 10.8592 | 376 | 3.8026 | | 3.306 | 11.0903 | 384 | 3.7934 | | 3.3196 | 11.3213 | 392 | 3.7919 | | 3.3031 | 11.5523 | 400 | 3.7908 | | 3.2851 | 11.7834 | 408 | 3.7817 | | 3.2703 | 12.0144 | 416 | 3.7789 | | 3.2132 | 12.2455 | 424 | 3.7818 | | 3.1829 | 12.4765 | 432 | 3.7778 | | 3.1968 | 12.7076 | 440 | 3.7749 | | 3.2206 | 12.9386 | 448 | 3.7711 | | 3.1521 | 13.1697 | 456 | 3.7694 | | 3.1412 | 13.4007 | 464 | 3.7700 | | 3.1415 | 13.6318 | 472 | 3.7709 | | 3.1402 | 13.8628 | 480 | 3.7694 | | 3.129 | 14.0939 | 488 | 3.7689 | | 3.1221 | 14.3249 | 496 | 3.7687 | | 3.1576 | 14.5560 | 504 | 3.7688 | ### Framework versions - Transformers 4.41.0 - Pytorch 2.3.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1