layoutlmv3-base-finetuned-rvlcdip

This model is a fine-tuned version of microsoft/layoutlmv3-base on the RVL-CDIP dataset processed using Amazon OCR. The following metrics were computed on the evaluation set after the final optimization step:

  • Evaluation Loss: 0.1856316477060318
  • Evaluation Accuracy: 0.9519237980949524
  • Evaluation Weighted F1: 0.9518911690649716
  • Evaluation Micro F1: 0.9519237980949524
  • Evaluation Macro F1: 0.9518042570370386
  • Evaluation Weighted Recall: 0.9519237980949524
  • Evaluation Micro Recall: 0.9519237980949524
  • Evaluation Macro Recall: 0.9518171728908463
  • Evaluation Weighted Precision: 0.9519094862975979
  • Evaluation Micro Precision: 0.9519237980949524
  • Evaluation Macro Precision: 0.9518423447239385
  • Evaluation Runtime (seconds): 514.7031
  • Evaluation Samples per Second: 77.713
  • Evaluation Steps per Second: 1.214

Training logs

See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok

Training arguments

The following arguments were provided to Trainer:

  • Output Directory: ./results
  • Maximum Steps: 20000
  • Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
  • Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
  • Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
  • Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
  • Evaluation Strategy: steps
  • Evaluation Steps: 1000
  • Evaluate on Start: True
  • Save Strategy: steps
  • Save Steps: 1000
  • Save Total Limit: 5
  • Learning Rate: 2e-5
  • Load Best Model at End: True
  • Metric for Best Model: accuracy
  • Greater is Better: True
  • Report to: wandb (log to Weights & Biases)
  • Logging Steps: 1000
  • Logging First Step: True
  • Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
  • FP16: True (due to CUDA memory constraints)
  • Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
  • DDP Find Unused Parameters: True

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.2.0+cu121
  • Datasets 2.14.0
  • Tokenizers 0.19.1
Downloads last month
18
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Evaluation results