--- license: cc-by-nc-sa-4.0 base_model: microsoft/layoutlmv2-base-uncased tags: - generated_from_trainer datasets: - cord metrics: - precision - recall - f1 - accuracy model-index: - name: layoutlmv2-finetuned-cord results: - task: name: Token Classification type: token-classification dataset: name: cord type: cord config: cord split: validation args: cord metrics: - name: Precision type: precision value: 0.9652945924132365 - name: Recall type: recall value: 0.9676375404530745 - name: F1 type: f1 value: 0.9664646464646465 - name: Accuracy type: accuracy value: 0.9702653247941445 --- # overfitting issue I use this colab: https://colab.research.google.com/drive/1AXh3G3-VmbMWlwbSvesVIurzNlcezTce?usp=sharing to Fine tuning LayoutLMv2ForTokenClassification on CORD dataset here is the result: https://huggingface.co/doc2txt/layoutlmv2-finetuned-cord * F1: 0.9665 and indeed the result are pretty amazing when running on the test set, however when running on any other receipt (printed or pdf) the result are completely off So from some reason the model is overfitting to the cord dataset, even though I use similar images for testing. I don't think that there is a **Data leakage** unless the cord DS is not clean (which I assume it is clean) What could be the reason for this? Is it some inherent property of LayoutLM? The LayoutLM models are somewhat old, and it seems deserted... I don't have much experience so I would appreciate any info Thanks here is an example code of how to run this model on a specific img folder: https://huggingface.co/doc2txt/layoutlmv2-finetuned-cord/blob/main/LayoutLMv2Main_cord2_gOcr_folder.py # layoutlmv2-finetuned-cord This model is a fine-tuned version of [microsoft/layoutlmv2-base-uncased](https://huggingface.co/microsoft/layoutlmv2-base-uncased) on the cord dataset. It achieves the following results on the evaluation set: - Loss: 0.2819 - Precision: 0.9653 - Recall: 0.9676 - F1: 0.9665 - Accuracy: 0.9703 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 2 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:| | No log | 1.0 | 400 | 1.2752 | 0.8527 | 0.8382 | 0.8454 | 0.8481 | | 1.9583 | 2.0 | 800 | 0.6372 | 0.8799 | 0.8948 | 0.8873 | 0.9021 | | 0.7097 | 3.0 | 1200 | 0.4255 | 0.9241 | 0.9264 | 0.9253 | 0.9414 | | 0.3845 | 4.0 | 1600 | 0.3021 | 0.9414 | 0.9482 | 0.9448 | 0.9611 | | 0.2699 | 5.0 | 2000 | 0.2819 | 0.9653 | 0.9676 | 0.9665 | 0.9703 | ### Framework versions - Transformers 4.37.2 - Pytorch 2.1.0+cu121 - Datasets 2.16.1 - Tokenizers 0.15.1