YOLOv8 Patent Text Region Detection Model
Model Description
patent_text_regions is a YOLOv8 model fine-tuned on a custom dataset of page-level images drawn from historical patent specifications published by the British Patent Office. It has been trained to recognize all text regions located within pages of patent specifications as a single class. We take the initialized weights from the official release of the small YOLOv8s model (yolov8s.pt) and fine tune on our custom dataset.
Usage
This model can be used in the same way as any pre-trained YOLOv8 model by setting the model path to best.pt.
Training Data
The dataset was created by randomly sampling 420 page images from British patent specifications published between 1850-1899. The data was randomly split 80-10-10 (train-val-test) and then standard preprocessing (images were stretched and auto-oriented to 640 x 640 pixels) and the following data augmentations were applied using Roboflow:
- Crop: 0% Minimum Zoom, 20% Maximum Zoom
- Grayscale: Apply to 15% of images
- Saturation: Between -25% and +25%
- Blur: Up to 2.5px
- Noise: Up to 0.1% of pixels
The custom dataset consists of 1,092 labelled images in total, which are made available in this repository.
Hyperparameters
We train the model using default hyperparameters, except from the batch size (128) and the number of epochs (300).
Evaluation
Evals on the test set are reported below:
- mAP50: 0.987
- mAP50-95: 0.892