File size: 2,713 Bytes
6baebf8
 
876a2af
 
 
 
 
 
d88fc0e
 
 
 
 
 
093e05f
 
 
e0129a4
d88fc0e
 
 
 
e0129a4
2e320b2
d88fc0e
 
 
 
 
 
f9b89c2
 
 
 
d88fc0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
510a020
d88fc0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25486d5
d88fc0e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: agpl-3.0
datasets:
- ds4sd/DocLayNet
metrics:
- precision
- recall
- f1
---

## Model Description

This model was developed to address the challenges of Document Layout Segmentation and Document Layout Analysis by accurately segmenting a document page into its core components. These components include the title, captions, footnotes, formulas, list items, page footers, page headers, and pictures. The motivation behind creating this model stems from the need to enhance the understanding and accessibility of document content, facilitating a wide range of applications such as automated content extraction, document summarization, and improved accessibility features. By providing precise segmentation of these elements, the model aims to support various downstream tasks that rely on the structural understanding of document layouts, enabling more efficient and effective processing and analysis of document content.

## Example Output
![Example](sample1_result.png "Example Output")

## Training Data: 
- **Source:** DocLayNet, IBM (https://github.com/DS4SD/DocLayNet)
- **Classes:** 11 classes (Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title)
- **Pages:** 80,863 document pages

## Performance
### Metrics: 
- **Precision:** 0.98
- **Recall:** 0.97
- **F1:** 0.97
- **mAP50:** 0.99
- **mAP50-95:** 0.95

### Confusion Matrix:
![Confusion Matrix](confusion_matrix.png "Confusion Matrix")


## Usage

### Example Code

To use the model, follow this example code:

```python
from ultralytics import YOLO
from PIL import Image, ImageDraw
import pathlib

# List of sample images to process
img_list = ['sample1.png', 'sample2.png', 'sample3.png']

# Load the document segmentation model
docseg_model = YOLO('yolov8x-doclaynet-epoch64-imgsz640-initiallr1e-4-finallr1e-5.pt')

# Process the images with the model
results = docseg_model(source=img_list, save=True, show_labels=True, show_conf=True, show_boxes=True)

# Initialize a dictionary to store results
mydict = {}

# Extract and store the paths and coordinates of detected components
for entry in results:
    thepath = pathlib.Path(entry.path)
    thecoords = entry.boxes.xyxy.numpy()
    mydict.update({thepath: thecoords})
```

## Model Details
- **Model Name:** DILHTWD/documentlayoutsegmentation_YOLOv8_ondoclaynet
- **Publisher:** Data Intelligence Lab, Hochschule für Technik und Wirtschaft Dresdem
- **Model Version:** 1.0.0
- **Model Date:** 2024-03-17
- **License:** [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.de.html)
- **Architecture:** YOLOv8 XL (https://github.com/ultralytics/ultralytics)
- **Task:** Document Layout Segmentation, Document Layout Analysis