gpt2-large-finetuned2

This model is a fine-tuned version of gpt2-large on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6494
Rouge1: 0.9235
Rouge2: 0.9153
Rougel: 0.9235
Rougelsum: 0.9235
Gen Len: 17.061

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 32
seed: 1
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.4757	1.0	278	1.5505	0.9312	0.9247	0.9312	0.9312	17.061
1.6314	2.0	556	1.2537	0.929	0.9222	0.929	0.929	17.061
1.3746	3.0	834	1.1054	0.9246	0.917	0.9246	0.9247	17.061
1.225	4.0	1112	1.0012	0.9294	0.9226	0.9293	0.9293	17.061
1.108	5.0	1390	0.9411	0.9253	0.9177	0.9253	0.9253	17.061
1.0381	6.0	1668	0.8901	0.9247	0.9173	0.9247	0.9247	17.061
0.9722	7.0	1946	0.8340	0.9247	0.917	0.9247	0.9247	17.061
0.9134	8.0	2224	0.7975	0.9236	0.9156	0.9236	0.9237	17.061
0.8894	9.0	2502	0.7745	0.9231	0.9158	0.9231	0.9232	17.061
0.8387	10.0	2780	0.7567	0.9212	0.9132	0.9212	0.9212	17.061
0.8224	11.0	3058	0.7374	0.9232	0.9152	0.9232	0.9232	17.061
0.8071	12.0	3336	0.7298	0.9237	0.9158	0.9237	0.9237	17.061
0.7973	13.0	3614	0.7209	0.9238	0.9161	0.9238	0.9238	17.061
0.7715	14.0	3892	0.7217	0.9231	0.915	0.9231	0.9231	17.061
0.771	15.0	4170	0.7085	0.9224	0.9139	0.9224	0.9224	17.061
0.7617	16.0	4448	0.7041	0.9211	0.9123	0.9211	0.9211	17.061
0.7603	17.0	4726	0.7004	0.9226	0.9146	0.9226	0.9227	17.061
0.7539	18.0	5004	0.6976	0.9253	0.9173	0.9252	0.9253	17.061
0.741	19.0	5282	0.6907	0.9229	0.9146	0.9229	0.9229	17.061
0.7422	20.0	5560	0.6898	0.9222	0.9141	0.9222	0.9222	17.061
0.7333	21.0	5838	0.6880	0.9223	0.9138	0.9223	0.9223	17.061
0.7378	22.0	6116	0.6837	0.9222	0.914	0.9222	0.9222	17.061
0.723	23.0	6394	0.6849	0.9225	0.914	0.9225	0.9225	17.061
0.7277	24.0	6672	0.6791	0.9235	0.9148	0.9235	0.9235	17.061
0.7222	25.0	6950	0.6834	0.9267	0.9189	0.9267	0.9267	17.061
0.7235	26.0	7228	0.6749	0.9221	0.9139	0.9221	0.9221	17.061
0.7207	27.0	7506	0.6741	0.9231	0.9149	0.9231	0.9231	17.061
0.7106	28.0	7784	0.6718	0.9224	0.9141	0.9224	0.9224	17.061
0.7086	29.0	8062	0.6706	0.9233	0.9153	0.9233	0.9233	17.061
0.7086	30.0	8340	0.6680	0.9241	0.9161	0.9241	0.9241	17.061
0.7081	31.0	8618	0.6678	0.9257	0.9177	0.9257	0.9257	17.061
0.6977	32.0	8896	0.6651	0.9229	0.9146	0.9229	0.9229	17.061
0.6937	33.0	9174	0.6634	0.9247	0.9167	0.9246	0.9247	17.061
0.6998	34.0	9452	0.6636	0.9243	0.916	0.9243	0.9243	17.061
0.6932	35.0	9730	0.6627	0.9254	0.9175	0.9254	0.9254	17.061
0.6978	36.0	10008	0.6612	0.9236	0.9154	0.9236	0.9236	17.061
0.6881	37.0	10286	0.6612	0.9251	0.9174	0.9251	0.9251	17.061
0.6874	38.0	10564	0.6589	0.9247	0.9167	0.9247	0.9247	17.061
0.6898	39.0	10842	0.6579	0.9235	0.9153	0.9235	0.9235	17.061
0.6857	40.0	11120	0.6568	0.9231	0.915	0.9231	0.9232	17.061
0.6751	41.0	11398	0.6554	0.924	0.9161	0.924	0.924	17.061
0.6782	42.0	11676	0.6547	0.9243	0.9164	0.9243	0.9243	17.061
0.6775	43.0	11954	0.6537	0.9242	0.9162	0.9242	0.9242	17.061
0.6764	44.0	12232	0.6530	0.923	0.9148	0.923	0.923	17.061
0.6741	45.0	12510	0.6524	0.9242	0.9161	0.9242	0.9242	17.061
0.6638	46.0	12788	0.6515	0.9241	0.9159	0.9241	0.9241	17.061
0.6634	47.0	13066	0.6509	0.9242	0.916	0.9242	0.9242	17.061
0.6614	48.0	13344	0.6500	0.9238	0.9156	0.9238	0.9238	17.061
0.6595	49.0	13622	0.6495	0.9236	0.9154	0.9236	0.9236	17.061
0.6541	50.0	13900	0.6494	0.9235	0.9153	0.9235	0.9235	17.061

Framework versions

Transformers 4.34.1
Pytorch 2.0.1
Datasets 2.14.6
Tokenizers 0.14.1

kowsiknd
/

gpt2-large-finetuned2

gpt2-large-finetuned2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for kowsiknd/gpt2-large-finetuned2

Evaluation results