YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.

This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.

Dataset Classes

The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:

Caption
Footnote
Formula
List-item
Page-footer
Page-header
Picture
Section-header
Table
Text
Title

Benchmark Results

The performance of the trained model was evaluated on the validation set, yielding the following metrics:

Class	Images	Instances	Box(P)	Box(R)	mAP50	mAP
all	6476	98604	0.905	0.866	0.925	0.759
Caption	6476	1763	0.921	0.868	0.949	0.878
Footnote	6476	312	0.888	0.779	0.839	0.637
Formula	6476	1894	0.893	0.839	0.914	0.748
List-item	6476	13320	0.905	0.915	0.94	0.807
Page-footer	6476	5571	0.94	0.941	0.974	0.651
Page-header	6476	6683	0.952	0.862	0.957	0.702
Picture	6476	1565	0.834	0.827	0.88	0.81
Section-header	6476	15744	0.919	0.902	0.962	0.635
Table	6476	2269	0.87	0.873	0.92	0.865
Text	6476	49185	0.937	0.923	0.967	0.833
Title	6476	298	0.898	0.792	0.873	0.779

These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.

Quick Guide to Run Inferencing

from ultralytics import YOLO
from PIL import Image

onnx_model = YOLO("best.onnx")

results = onnx_model("<path_to_image>", imgsz=1024)

for i, r in enumerate(results):
    im_bgr = r.plot()  
    im_rgb = Image.fromarray(im_bgr[..., ::-1]) 

    r.show()

    r.save(filename=f'results{i}.jpg')

malaysia-ai
/

YOLOv8X-DocLayNet-Full-1024-42

YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.

Dataset Classes

Benchmark Results

Quick Guide to Run Inferencing

Collection including malaysia-ai/YOLOv8X-DocLayNet-Full-1024-42

Open models