Florence-2-finetuned-HuggingFaceM4-DOcumentVQA

This model is a fine-tuned version of microsoft/Florence-2-base-ft on HuggingFaceM4/DocumentVQA dataset.

It is the result of the post Fine tuning Florence-2

It achieves the following results on the evaluation set:

  • Loss: 0.7168

Model description

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

He has also been finetuned in the docVQA task.

Training and evaluation data

This is finetuned on HuggingFaceM4/DocumentVQA dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-6
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • num_epochs: 3

Training results

Training Loss Epoch Validation Loss
1.1535 1.0 0.7698
0.6530 2.0 0.7253
0.5878 3.0 0.7168

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train Maximofn/Florence-2-finetuned-HuggingFaceM4-DocumentVQA

Collection including Maximofn/Florence-2-finetuned-HuggingFaceM4-DocumentVQA