Edit model card

vit-swin-base-224-gpt2-image-captioning

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0001
  • Rouge1: 99.2148
  • Rouge2: 99.1824
  • Rougel: 99.22
  • Rougelsum: 99.2169
  • Bleu: 96.4656
  • Gen Len: 10.4161

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len
0.622 11.36 2000 0.0330 91.0769 88.8333 90.7025 90.7277 84.8472 10.4161
0.0547 22.73 4000 0.0015 99.0694 98.9636 99.0615 99.0613 96.1312 10.4161
0.0238 34.09 6000 0.0007 99.1681 99.0942 99.167 99.1646 96.3754 10.4161
0.0046 45.45 8000 0.0001 99.2225 99.1781 99.217 99.2171 96.4412 10.4161

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
240M params
Tensor type
I64
·
F32
·
Unable to determine this model’s pipeline type. Check the docs .