Edit model card

vit-swin-base-224-gpt2-image-captioning

This model is a fine-tuned version of on the coco dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8174
  • Rouge1: 41.4513
  • Rouge2: 15.9705
  • Rougel: 37.8534
  • Rougelsum: 37.8514
  • Bleu: 9.9633
  • Gen Len: 11.3253

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len
1.091 0.19 2000 0.9783 35.5981 11.1245 32.4533 32.4622 6.1315 11.3253
0.9629 0.38 4000 0.9306 36.8386 12.0629 33.7446 33.7445 6.806 11.3253
0.9251 0.57 6000 0.9004 37.8439 13.1346 34.663 34.6608 7.6122 11.3253
0.9116 0.75 8000 0.8759 38.5078 13.477 35.1981 35.2143 7.6881 11.3253
0.8903 0.94 10000 0.8592 39.6087 14.2529 36.0992 36.1042 8.5688 11.3253
0.8381 1.13 12000 0.8480 40.3217 15.012 36.8038 36.8046 9.1783 11.3253
0.8066 1.32 14000 0.8383 40.7187 15.1971 37.15 37.148 9.2942 11.3253
0.7938 1.51 16000 0.8298 41.1227 15.635 37.423 37.4147 9.6574 11.3253
0.7854 1.7 18000 0.8232 41.5275 16.007 37.8586 37.8569 9.8936 11.3253
0.7837 1.88 20000 0.8190 41.2515 15.8468 37.6257 37.6252 9.8732 11.3253

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .