layoutlmv3-base-finetuned-rvlcdip

This model is a fine-tuned version of microsoft/layoutlmv3-base on the RVL-CDIP dataset processed using Amazon OCR. The following metrics were computed on the evaluation set after the final optimization step:

Evaluation Loss: 0.1856316477060318
Evaluation Accuracy: 0.9519237980949524
Evaluation Weighted F1: 0.9518911690649716
Evaluation Micro F1: 0.9519237980949524
Evaluation Macro F1: 0.9518042570370386
Evaluation Weighted Recall: 0.9519237980949524
Evaluation Micro Recall: 0.9519237980949524
Evaluation Macro Recall: 0.9518171728908463
Evaluation Weighted Precision: 0.9519094862975979
Evaluation Micro Precision: 0.9519237980949524
Evaluation Macro Precision: 0.9518423447239385
Evaluation Runtime (seconds): 514.7031
Evaluation Samples per Second: 77.713
Evaluation Steps per Second: 1.214

Training logs

See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok

Training arguments

The following arguments were provided to Trainer:

Output Directory: ./results
Maximum Steps: 20000
Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
Evaluation Strategy: steps
Evaluation Steps: 1000
Evaluate on Start: True
Save Strategy: steps
Save Steps: 1000
Save Total Limit: 5
Learning Rate: 2e-5
Load Best Model at End: True
Metric for Best Model: accuracy
Greater is Better: True
Report to: wandb (log to Weights & Biases)
Logging Steps: 1000
Logging First Step: True
Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
FP16: True (due to CUDA memory constraints)
Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
DDP Find Unused Parameters: True

Framework versions

Transformers 4.42.3
Pytorch 2.2.0+cu121
Datasets 2.14.0
Tokenizers 0.19.1

gordonlim
/

layoutlmv3-base-finetuned-rvlcdip

layoutlmv3-base-finetuned-rvlcdip

Training logs

Training arguments

Framework versions

Evaluation results