Edit model card

This model is a variation of https://huggingface.co/nlpconnect/vit-gpt2-image-captioning

Results after after 3 epochs (and ~45 hours of training)

  • eval_loss: 0.19939416646957397
  • eval_rouge1: 43.006
  • eval_rouge2: 16.9939
  • eval_rougeL: 38.8923
  • eval_rougeLsum: 38.8877
  • eval_gen_len: 11.327256736227712
  • eval_runtime: 1816.5255
  • eval_samples_per_second: 13.77
  • eval_steps_per_second': 1.721
  • train_runtime: 46263.3695
  • train_samples_per_second: 38.373
  • train_steps_per_second: 4.797
  • train_loss: 0.05974134062104816
Downloads last month
193
Safetensors
Model size
182M params
Tensor type
F32
·

Finetuned from