HugoSchtr
/

yolov5_datacat

digital humanities

object detection

computer-vision

document layout analysis

Model card Files Files and versions Community

HugoSchtr commited on Dec 20, 2022

Commit

344bee7

•

1 Parent(s): ec352c4

readme updated

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
 ---
 license: cc-by-4.0
 ---

 ---
 license: cc-by-4.0
+tags:
+  - yolov5
+  - yolo
+  - digital-humanities
+  - object-detection
+  - computer-vision
+  - document-layout-analysis
 ---
+# What's YOLOv5
+YOLOv5 is an open-source object detection model released by [Ultralytics](https://ultralytics.com/), on [Github](https://github.com/ultralytics/yolov5).
+# DataCatalogue (or DataCat)
+(DataCatalogue)[https://github.com/DataCatalogue] is a research projet jointly led by Inria, the Bibliothèque nationale de France (National Library of France) and the Institut national d'histoire de l'art (National Institute of Art History).
+It aims at restructuring OCR-ed auction sale catalogs kept in France national collections into TEI-XML, using machine learning solutions.
+# DataCat Yolov5
+We trained a YOLOv5 model on custom data to perform document layout analysis on auction sale catalogs.
+The training set consists of **581 images**, annotated with **two classes**:
+* *title* (585 instances)
+* *entry* (it refers to a catalog entry) (5017 instances)
+59 images were used for validation.
+We reached:
+| precision | recall | mAP_0.5 | mAP_0.5:0.95 |
+|---|---|---|---|
+| 0.99 | 0.99 | 0.98 | 0.75 |
+# Dataset
+The dataset is not released for the moment.
+## Demo
+An interactive demo is available on the following HugginFace Space: https://huggingface.co/spaces/HugoSchtr/DataCat_Yolov5
+## What's next
+The model performs well on our data and now needs to be incorporated into a dedicated pipeline for the research project.
+We also plan to train a new model on a larger training set in the near future.