Vit-GPT2-COCO2017Flickr-85k-11

This model is a fine-tuned version of NourFakih/Vit-GPT2-COCO2017Flickr-85k-11 on an unknown dataset. It achieves the following results on the evaluation set:

Gen Len: 12.1495
Loss: 0.5306
Rouge1: 40.0349
Rouge2: 14.6303
Rougel: 36.2382
Rougelsum: 36.2213

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Gen Len	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
0.378	0.0933	500	11.7725	0.4693	40.2274	15.0119	36.4563	36.4656
0.3748	0.1866	1000	12.1668	0.4640	40.199	15.321	36.4279	36.4457
0.374	0.2799	1500	11.8	0.4669	39.9523	15.0587	36.3639	36.375
0.3721	0.3732	2000	11.2095	0.4645	40.3597	15.2173	36.6938	36.705
0.3673	0.4665	2500	11.9343	0.4632	40.3875	15.2532	36.5923	36.6182
0.365	0.5599	3000	12.2647	0.4623	39.9395	15.0315	36.1682	36.1781
0.3652	0.6532	3500	11.8965	0.4611	39.8792	14.9961	36.2488	36.2734
0.3601	0.7465	4000	12.0545	0.4625	40.57	15.2972	36.8012	36.8227
0.3574	0.8398	4500	11.7287	0.4608	40.3276	15.1742	36.7679	36.7575
0.351	0.9331	5000	11.7662	0.4650	40.7345	15.5295	37.0769	37.0911
0.3322	1.0264	5500	12.06	0.4831	40.5582	15.2954	36.6682	36.6694
0.2914	1.1197	6000	11.8405	0.4902	40.054	15.019	36.5476	36.556
0.2945	1.2130	6500	11.8422	0.4863	40.3126	15.3154	36.61	36.6146
0.2845	1.3063	7000	12.0445	0.4883	40.228	15.0904	36.3179	36.3086
0.2879	1.3996	7500	11.9358	0.4833	40.6501	15.5682	36.8945	36.8823
0.2859	1.4930	8000	12.1743	0.4833	40.3187	15.0418	36.3561	36.3582
0.2844	1.5863	8500	12.1702	0.4884	40.2896	15.1032	36.4039	36.3862
0.2838	1.6796	9000	11.9588	0.4902	40.3419	15.1863	36.4631	36.4728
0.2789	1.7729	9500	12.0567	0.4865	40.6284	15.3404	36.7035	36.6876
0.2758	1.8662	10000	11.823	0.4909	40.1138	14.9247	36.4884	36.4836
0.2741	1.9595	10500	11.9537	0.4892	40.3204	14.9594	36.539	36.5311
0.253	2.0529	11000	11.9712	0.5201	40.0224	14.9662	36.3433	36.3705
0.2261	2.1462	11500	11.8918	0.5248	39.698	14.3092	35.9144	35.9107
0.2245	2.2395	12000	12.0252	0.5204	40.136	14.8487	36.4154	36.3989
0.2293	2.3328	12500	11.8622	0.5261	39.9269	14.6665	36.2594	36.2517
0.2255	2.4261	13000	11.9165	0.5217	40.1403	14.7327	36.4161	36.4139
0.228	2.5195	13500	11.9477	0.5267	39.7979	14.4362	36.0457	36.0611
0.2233	2.6128	14000	12.0495	0.5299	39.8343	14.4579	36.0728	36.0824
0.2239	2.7062	14500	12.1308	0.5274	39.9561	14.5286	36.1101	36.1017
0.2254	2.7995	15000	12.0845	0.5292	39.9252	14.5215	36.1396	36.1203
0.2182	2.8928	15500	12.115	0.5297	39.9487	14.5406	36.1582	36.1321
0.221	2.9861	16000	12.1495	0.5306	40.0349	14.6303	36.2382	36.2213

Framework versions

Transformers 4.41.2
Pytorch 2.1.2
Datasets 2.19.2
Tokenizers 0.19.1

NourFakih
/

Vit-GPT2-COCO2017Flickr-85k-11

Vit-GPT2-COCO2017Flickr-85k-11

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-85k-11

Space using NourFakih/Vit-GPT2-COCO2017Flickr-85k-11 1

Evaluation results