Vit-GPT2-COCO2017Flickr-40k-04

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

  • eval_loss: 0.4650
  • eval_rouge1: 42.848
  • eval_rouge2: 17.6905
  • eval_rougeL: 36.5451
  • eval_rougeLsum: 38.9854
  • eval_gen_len: 12.025
  • eval_samples_per_second: 7.371
  • eval_steps_per_second: 1.843
  • epoch: 1.4
  • step: 7000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.1497 0.1 500 0.5462 40.1774 14.6199 36.3335 36.3518 12.5965
0.1604 0.2 1000 0.5302 41.4714 16.0237 37.5992 37.5915 11.914
0.1631 0.3 1500 0.5436 40.3816 14.6958 36.6109 36.6027 12.3295
0.1634 0.4 2000 0.5266 40.9484 15.9068 37.5194 37.5088 12.033
0.1576 0.5 2500 0.5544 40.373 15.012 36.5218 36.5141 12.3345
0.1599 0.6 3000 0.5425 40.7552 15.2754 37.1059 37.1299 12.191
0.291 0.7 3500 0.4545 41.5934 16.251 37.7291 37.7113 12.0295
0.2825 0.8 4000 0.4558 42.6728 17.1703 38.8692 38.8841 12.246
0.2737 0.9 4500 0.4565 43.0036 16.8421 39.1761 39.1693 11.7975
0.2683 1.0 5000 0.4576 42.1341 16.7973 38.2881 38.3083 11.8655
0.1687 1.1 5500 0.4996 41.7152 16.4042 37.7724 37.7629 12.384
0.168 1.2 6000 0.5046 41.6521 16.6159 37.7915 37.7778 12.661
0.1688 1.3 6500 0.5020 42.3292 17.1408 38.5407 38.5282 11.846
0.1682 1.4 7000 0.5045 42.848 17.6905 38.9854 38.9896 12.025

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
19
Safetensors
Model size
239M params
Tensor type
F32
·
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for NourFakih/Vit-GPT2-COCO2017Flickr-40k-04

Finetuned
(10)
this model
Finetunes
1 model