Edit model card

20231102-20_epochs_layoutlmv2-base-uncased_finetuned_docvqa

This model was trained from scratch on the 1.2 Example dataset released by DocVQA. It achieves the following results on the evaluation set:

  • Loss: 2.9087

Model description

This DocVQA model, built on the Layout LM v2 framework, represents an initial step in a series of experimental models aimed at document visual question answering. It's the "mini" version in a planned series, trained on a relatively small dataset of 1.2k samples (1,000 for training and 200 for testing) over 20 epochs. The training setup was modest, employing mixed precision (fp16), with manageable batch sizes and a focused approach to learning rate adjustment (warmup steps and weight decay). Notably, this model was trained without external reporting tools, emphasizing internal evaluation. As the first iteration in a progressive series that will later include medium (5k samples) and large (50k samples) models, this version serves as a foundational experiment, setting the stage for more extensive and complex models in the future.

Intended uses & limitations

Experimental Only

Training and evaluation data

Based on the sample 1.2 dataset released by DocVQA

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss
4.3689 3.51 100 3.7775
3.2761 7.02 200 3.3707
2.6415 10.53 300 3.0807
2.2233 14.04 400 3.0120
1.9586 17.54 500 2.9087

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.10.1
  • Tokenizers 0.14.1
Downloads last month
4