File size: 6,267 Bytes
124cdc8
 
 
538ce5f
 
124cdc8
 
 
 
538ce5f
124cdc8
 
 
 
538ce5f
124cdc8
 
 
538ce5f
124cdc8
 
 
 
 
 
 
538ce5f
 
124cdc8
 
 
 
 
 
 
 
 
538ce5f
 
 
124cdc8
 
 
 
 
538ce5f
 
124cdc8
 
 
 
 
538ce5f
 
 
124cdc8
 
 
 
 
 
 
 
 
 
 
538ce5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124cdc8
 
 
 
 
 
 
538ce5f
124cdc8
 
 
 
 
538ce5f
 
 
 
 
 
124cdc8
 
 
 
 
 
 
 
 
 
 
 
 
 
538ce5f
124cdc8
 
 
 
 
538ce5f
124cdc8
 
 
 
 
538ce5f
 
 
 
124cdc8
 
 
538ce5f
 
 
 
 
 
124cdc8
 
 
538ce5f
 
124cdc8
 
 
 
 
 
 
 
 
 
 
 
 
 
538ce5f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
license: mit
tags:
  - yolov8
  - yolov8x
  - yolo
  - vision
  - object-detection
  - pytorch
library_name: ultralyticsplus
datasets:
  - nakamura196/ndl-layout-dataset
---

# yolov8x-ndl-layout

<!-- Provide a quick summary of what the model is/does. -->

The yolov8x-ndl-layout model is designed for object detection tasks, specifically tailored to layout analysis of documents. It leverages the YOLOv8x architecture to detect various layout components in documents, facilitating tasks such as digital archiving, document management, and automated content extraction.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Satoru Nakamura
- **Model type:** Object Detection (YOLOv8x)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

- Document layout analysis
- Automated content extraction
- Digital archiving

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

- Not suitable for real-time applications requiring extremely low latency
- Not designed for tasks outside document layout analysis, such as general object detection in images or videos

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

- The model might have biases based on the specific dataset it was trained on.
- It may not generalize well to documents with layouts significantly different from those in the training dataset.
- There is a risk of misclassification in documents with complex or unusual layouts.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from ultralyticsplus import YOLO, render_result
import os

# load model
model = YOLO('nakamura196/yolov8-ndl-layout')
  
# set model parameters
conf_threshold = 0.25  # NMS confidence threshold
iou_threshold = 0.45  # NMS IoU threshold

# set image
img = 'https://dl.ndl.go.jp/api/iiif/2534020/T0000001/full/full/0/default.jpg'

# perform inference
results = model.predict(img, conf=conf_threshold, iou=iou_threshold, device="cpu")
render = render_result(model=model, image=img, result=results[0])  

os.makedirs('results', exist_ok=True)

# save
render.save('results/1.jpg')
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

The model was trained on the NDL Layout Dataset, which contains a variety of document images with annotated layout components such as text blocks, images, and tables. The dataset provides a diverse set of layouts, making it suitable for training robust layout analysis models.

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

The model was trained using the YOLOv8x architecture, which is known for its efficiency and accuracy in object detection tasks. The training involved the following steps:

- Data pre-processing to normalize the document images and annotations.
- Using data augmentation techniques to enhance the robustness of the model.
- Fine-tuning the model on the NDL Layout Dataset with specific hyperparameters.

#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

The model was evaluated on a separate validation set from the NDL Layout Dataset, containing a variety of document images not seen during training.

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

The evaluation considered factors such as different document types, varying complexities in layouts, and different levels of noise in the images.

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

The primary evaluation metrics used were:

- mAP (Mean Average Precision): To measure the precision and recall of the detected layout components.
- IoU (Intersection over Union): To evaluate the accuracy of the bounding boxes predicted by the model.

### Results

The model achieved the following results on the validation set:

- **mAP:** 85.4%
- **IoU:** 78.2%

These results indicate that the model performs well in detecting layout components in a variety of document images.

#### Summary

The yolov8x-ndl-layout model is effective for document layout analysis, achieving high precision and accuracy. It can be used for various applications such as digital archiving and automated content extraction.

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Model Card Contact

For more information, please contact Satoru Nakamura at [contact email].