Original model description:

base_model: ai-forever/rugpt3medium_based_on_gpt2 tags: - generated_from_trainer model-index: - name: my_rugpt3medium_finetune results: []

my_rugpt3medium_finetune

This model is a fine-tuned version of ai-forever/rugpt3medium_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9955

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 3
total_train_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1000
num_epochs: 35
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.5373	0.46	25	3.4828
3.5265	0.93	50	3.4708
3.478	1.39	75	3.4398
3.4851	1.85	100	3.3995
3.4407	2.31	125	3.3609
3.3731	2.78	150	3.3241
3.3584	3.24	175	3.2886
3.3267	3.7	200	3.2540
3.3043	4.17	225	3.2200
3.229	4.63	250	3.1853
3.2618	5.09	275	3.1508
3.1823	5.56	300	3.1164
3.172	6.02	325	3.0779
3.1354	6.48	350	3.0395
3.0899	6.94	375	2.9987
3.0741	7.41	400	2.9577
3.009	7.87	425	2.9140
2.9598	8.33	450	2.8737
2.9187	8.8	475	2.8294
2.9378	9.26	500	2.7842
2.8396	9.72	525	2.7374
2.8608	10.19	550	2.6889
2.7296	10.65	575	2.6405
2.7452	11.11	600	2.5926
2.6882	11.57	625	2.5389
2.6463	12.04	650	2.4893
2.572	12.5	675	2.4356
2.5384	12.96	700	2.3788
2.5246	13.43	725	2.3296
2.4055	13.89	750	2.2747
2.3759	14.35	775	2.2155
2.3351	14.81	800	2.1606
2.286	15.28	825	2.1061
2.2694	15.74	850	2.0504
2.1745	16.2	875	1.9967
2.1053	16.67	900	1.9411
2.1184	17.13	925	1.8878
2.0107	17.59	950	1.8362
2.027	18.06	975	1.7854
1.9153	18.52	1000	1.7304
1.9267	18.98	1025	1.6854
1.8131	19.44	1050	1.6331
1.8405	19.91	1075	1.5839
1.7294	20.37	1100	1.5370
1.7154	20.83	1125	1.4971
1.6573	21.3	1150	1.4476
1.6391	21.76	1175	1.4130
1.5497	22.22	1200	1.3727
1.5194	22.69	1225	1.3378
1.535	23.15	1250	1.3000
1.4514	23.61	1275	1.2714
1.4711	24.07	1300	1.2388
1.4105	24.54	1325	1.2136
1.4202	25.0	1350	1.1890
1.3351	25.46	1375	1.1679
1.3575	25.93	1400	1.1440
1.2882	26.39	1425	1.1202
1.3378	26.85	1450	1.1074
1.3094	27.31	1475	1.0864
1.2793	27.78	1500	1.0743
1.2377	28.24	1525	1.0626
1.2693	28.7	1550	1.0468
1.2157	29.17	1575	1.0368
1.2007	29.63	1600	1.0263
1.2376	30.09	1625	1.0221
1.2216	30.56	1650	1.0136
1.1923	31.02	1675	1.0102
1.2143	31.48	1700	1.0039
1.1764	31.94	1725	1.0014
1.1654	32.41	1750	0.9990
1.2031	32.87	1775	0.9976
1.1952	33.33	1800	0.9965
1.1852	33.8	1825	0.9961
1.1737	34.26	1850	0.9959
1.1609	34.72	1875	0.9955

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu121
Datasets 2.16.0
Tokenizers 0.15.0