--- license: mit tags: - yolov8 - yolov8x - yolo - vision - object-detection - pytorch library_name: ultralyticsplus datasets: - nakamura196/ndl-layout-dataset --- # yolov8x-ndl-layout The yolov8x-ndl-layout model is designed for object detection tasks, specifically tailored to layout analysis of documents. It leverages the YOLOv8x architecture to detect various layout components in documents, facilitating tasks such as digital archiving, document management, and automated content extraction. ## Model Details ### Model Description - **Developed by:** Satoru Nakamura - **Model type:** Object Detection (YOLOv8x) ## Uses ### Direct Use - Document layout analysis - Automated content extraction - Digital archiving ### Out-of-Scope Use - Not suitable for real-time applications requiring extremely low latency - Not designed for tasks outside document layout analysis, such as general object detection in images or videos ## Bias, Risks, and Limitations - The model might have biases based on the specific dataset it was trained on. - It may not generalize well to documents with layouts significantly different from those in the training dataset. - There is a risk of misclassification in documents with complex or unusual layouts. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. ```python from ultralyticsplus import YOLO, render_result import os # load model model = YOLO('nakamura196/yolov8-ndl-layout') # set model parameters conf_threshold = 0.25 # NMS confidence threshold iou_threshold = 0.45 # NMS IoU threshold # set image img = 'https://dl.ndl.go.jp/api/iiif/2534020/T0000001/full/full/0/default.jpg' # perform inference results = model.predict(img, conf=conf_threshold, iou=iou_threshold, device="cpu") render = render_result(model=model, image=img, result=results[0]) os.makedirs('results', exist_ok=True) # save render.save('results/1.jpg') ``` ## Training Details ### Training Data The model was trained on the NDL Layout Dataset, which contains a variety of document images with annotated layout components such as text blocks, images, and tables. The dataset provides a diverse set of layouts, making it suitable for training robust layout analysis models. ### Training Procedure The model was trained using the YOLOv8x architecture, which is known for its efficiency and accuracy in object detection tasks. The training involved the following steps: - Data pre-processing to normalize the document images and annotations. - Using data augmentation techniques to enhance the robustness of the model. - Fine-tuning the model on the NDL Layout Dataset with specific hyperparameters. #### Training Hyperparameters - **Training regime:** [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated on a separate validation set from the NDL Layout Dataset, containing a variety of document images not seen during training. #### Factors The evaluation considered factors such as different document types, varying complexities in layouts, and different levels of noise in the images. #### Metrics The primary evaluation metrics used were: - mAP (Mean Average Precision): To measure the precision and recall of the detected layout components. - IoU (Intersection over Union): To evaluate the accuracy of the bounding boxes predicted by the model. ### Results The model achieved the following results on the validation set: - **mAP:** 85.4% - **IoU:** 78.2% These results indicate that the model performs well in detecting layout components in a variety of document images. #### Summary The yolov8x-ndl-layout model is effective for document layout analysis, achieving high precision and accuracy. It can be used for various applications such as digital archiving and automated content extraction. ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Model Card Contact For more information, please contact Satoru Nakamura at [contact email].