Image classification using fine-tuned ViT - for historical :bowtie: documents sorting
Goal: solve a task of archive page images sorting (for their further content-based processing)
Scope: Processing of images, training and evaluation of ViT model, input file/directory processing, class π·οΈ (category) results of top N predictions output, predictions summarizing into a tabular format, HF π hub support for the model
Model description π
π² Fine-tuned model repository: vit-historical-page ^1 π
π³ Base model repository: google's vit-base-patch16-224 ^2 π
Data π
Training set of the model: 8950 images
Categories π·οΈ
LabelοΈ | Ratio | Description |
---|---|---|
DRAW | 11.89% | π - drawings, maps, paintings with text |
DRAW_L | 8.17% | ππ - drawings ... with a table legend or inside tabular layout / forms |
LINE_HW | 5.99% | βοΈπ - handwritten text lines inside tabular layout / forms |
LINE_P | 6.06% | π - printed text lines inside tabular layout / forms |
LINE_T | 13.39% | π - machine typed text lines inside tabular layout / forms |
PHOTO | 10.21% | π - photos with text |
PHOTO_L | 7.86% | ππ - photos inside tabular layout / forms or with a tabular annotation |
TEXT | 8.58% | π° - mixed types of printed and handwritten texts |
TEXT_HW | 7.36% | βοΈπ - only handwritten text |
TEXT_P | 6.95% | π - only printed text |
TEXT_T | 13.53% | π - only machine typed text |
Evaluation set (same proportions): 995 images
Data preprocessing
During training the following transforms were applied randomly with a 50% chance:
- transforms.ColorJitter(brightness 0.5)
- transforms.ColorJitter(contrast 0.5)
- transforms.ColorJitter(saturation 0.5)
- transforms.ColorJitter(hue 0.5)
- transforms.Lambda(lambda img: ImageEnhance.Sharpness(img).enhance(random.uniform(0.5, 1.5)))
- transforms.Lambda(lambda img: img.filter(ImageFilter.GaussianBlur(radius=random.uniform(0, 2))))
Training Hyperparameters
- eval_strategy "epoch"
- save_strategy "epoch"
- learning_rate 5e-5
- per_device_train_batch_size 8
- per_device_eval_batch_size 8
- num_train_epochs 3
- warmup_ratio 0.1
- logging_steps 10
- load_best_model_at_end True
- metric_for_best_model "accuracy"
Results π
Evaluation set's accuracy (Top-3): 99.6%
Evaluation set's accuracy (Top-1): 97.3%
Result tables
Manually β checked evaluation dataset results (TOP-3): model_TOP-3_EVAL.csv π
Manually β checked evaluation dataset results (TOP-1): model_TOP-1_EVAL.csv π
Table columns
- FILE - name of the file
- PAGE - number of the page
- CLASS-N - label of the category π·οΈ, guess TOP-N
- SCORE-N - score of the category π·οΈ, guess TOP-N
- TRUE - actual label of the category π·οΈ
Contacts π§
For support write to π§ lutsai.k@gmail.com π§
Official repository: UFAL ^3
Acknowledgements π
- Developed by UFAL ^5 π₯
- Funded by ATRIUM ^4 π°
- Shared by ATRIUM ^4 & UFAL ^5
- Model type: fine-tuned ViT ^2 with a 224x224 resolution size
Β©οΈ 2022 UFAL & ATRIUM
- Downloads last month
- 43
Model tree for k4tel/vit-historical-page
Base model
google/vit-base-patch16-224