vit-swin-base-224-gpt2-image-captioning

This model is a fine-tuned version of on the coco dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Bleu	Gen Len
1.091	0.19	2000	0.9783	35.5981	11.1245	32.4533	32.4622	6.1315	11.3253
0.9629	0.38	4000	0.9306	36.8386	12.0629	33.7446	33.7445	6.806	11.3253
0.9251	0.57	6000	0.9004	37.8439	13.1346	34.663	34.6608	7.6122	11.3253
0.9116	0.75	8000	0.8759	38.5078	13.477	35.1981	35.2143	7.6881	11.3253
0.8903	0.94	10000	0.8592	39.6087	14.2529	36.0992	36.1042	8.5688	11.3253
0.8381	1.13	12000	0.8480	40.3217	15.012	36.8038	36.8046	9.1783	11.3253
0.8066	1.32	14000	0.8383	40.7187	15.1971	37.15	37.148	9.2942	11.3253
0.7938	1.51	16000	0.8298	41.1227	15.635	37.423	37.4147	9.6574	11.3253
0.7854	1.7	18000	0.8232	41.5275	16.007	37.8586	37.8569	9.8936	11.3253
0.7837	1.88	20000	0.8190	41.2515	15.8468	37.6257	37.6252	9.8732	11.3253