layoutlmv3-base-finetuned-rvlcdip
This model is a fine-tuned version of microsoft/layoutlmv3-base on the RVL-CDIP dataset processed using Amazon OCR. The following metrics were computed on the evaluation set after the final optimization step:
- Evaluation Loss: 0.1856316477060318
- Evaluation Accuracy: 0.9519237980949524
- Evaluation Weighted F1: 0.9518911690649716
- Evaluation Micro F1: 0.9519237980949524
- Evaluation Macro F1: 0.9518042570370386
- Evaluation Weighted Recall: 0.9519237980949524
- Evaluation Micro Recall: 0.9519237980949524
- Evaluation Macro Recall: 0.9518171728908463
- Evaluation Weighted Precision: 0.9519094862975979
- Evaluation Micro Precision: 0.9519237980949524
- Evaluation Macro Precision: 0.9518423447239385
- Evaluation Runtime (seconds): 514.7031
- Evaluation Samples per Second: 77.713
- Evaluation Steps per Second: 1.214
Training logs
See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok
Training arguments
The following arguments were provided to Trainer:
- Output Directory: ./results
- Maximum Steps: 20000
- Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
- Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
- Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
- Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
- Evaluation Strategy: steps
- Evaluation Steps: 1000
- Evaluate on Start: True
- Save Strategy: steps
- Save Steps: 1000
- Save Total Limit: 5
- Learning Rate: 2e-5
- Load Best Model at End: True
- Metric for Best Model: accuracy
- Greater is Better: True
- Report to: wandb (log to Weights & Biases)
- Logging Steps: 1000
- Logging First Step: True
- Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
- FP16: True (due to CUDA memory constraints)
- Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
- DDP Find Unused Parameters: True
Framework versions
- Transformers 4.42.3
- Pytorch 2.2.0+cu121
- Datasets 2.14.0
- Tokenizers 0.19.1
- Downloads last month
- 18
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Evaluation results
- Evaluation Loss on rvl-cdipself-reported0.186
- Evaluation Accuracy on rvl-cdipself-reported0.952
- Evaluation Weighted F1 on rvl-cdipself-reported0.952
- Evaluation Micro F1 on rvl-cdipself-reported0.952
- Evaluation Macro F1 on rvl-cdipself-reported0.952
- Evaluation Weighted Recall on rvl-cdipself-reported0.952
- Evaluation Micro Recall on rvl-cdipself-reported0.952
- Evaluation Macro Recall on rvl-cdipself-reported0.952
- Evaluation Weighted Precision on rvl-cdipself-reported0.952
- Evaluation Micro Precision on rvl-cdipself-reported0.952