# 360LayoutAnalysis

[Chinese](./README.md)

## I. Background

In today's digital era, **Document Layout Analysis** is one of the key steps in information extraction and document understanding. In today's digital era, **Document Layout Analysis** is one of the key steps in information extraction and document understanding. Also known as document image analysis or document layout analysis, it involves the process of identifying and extracting text, images, tables, and other elements from scanned document images. This technology has a broad range of applications in automated document processing, electronic data exchange, historical document digitization, and other fields. Traditional document layout analysis models often struggle to accurately distinguish between paragraphs and other layout elements within documents, which limits further processing and utilization of document information. The advancement of deep learning and pattern recognition technologies has brought new opportunities for document layout analysis. By training datasets, the model's understanding of document structure can be enhanced. High-quality annotated datasets are fundamental to training effective models. In document layout analysis, detailed annotation is essential, particularly the annotation of **paragraphs**, as it directly affects semantic understanding and information extraction of the text.

Our team has constructed multiple Chinese document datasets with paragraph annotations for various scenarios to ensure the model's generalization capability. For example, in the **academic paper** scenario, previous open-source datasets such as CDLA (A Chinese document layout analysis) lacked annotations for paragraph information; in the **research report** scenario, we have filled the gap for this particular area. Using these annotated datasets, we have trained several new Chinese document layout analysis models. These models are designed to identify paragraph boundaries in documents and accurately distinguish between text, images, tables, formulas, and other elements.

This time, we have open-sourced the layout analysis model weights and corresponding label systems for both the academic paper and research report scenarios.

## II. Usage

- Weights download link: [🤗LINK](https://huggingface.co/qihoo360)

- Usage:

  The open-source weights are trained with `yolov8`, and the prediction method is as follows:

  ```python
  from ultralytics import YOLO
  
  image_path = ''  # Path to the image to be predicted
  model_path = ''  # Path to the weights
  model = YOLO(model_path)
  
  result = model(image_path, save=True, conf=0.5, save_crop=False, line_width=2)
  print(result)
  ```

## III. Layout Analysis

### 3.1 Academic Paper Scenario

- Label Categories

  | Element        | Name                  |
  | -------------- | --------------------- |
  | Text           | Main Text (Paragraph) |
  | Title          | Title                 |
  | Figure         | Image                 |
  | Figure caption | Image Caption         |
  | Table          | Table                 |
  | Table caption  | Table Caption         |
  | Header         | Header                |
  | Footer         | Footer                |
  | Reference      | Reference             |
  | Equation       | Equation              |

- Example

<div align="center">
    <img src="./case/paper/1.jpg" width="50%" height="50%">
    <img src="./case/paper/2.jpg" width="50%" height="50%">
</div>

### 3.2 Research Report Scenario

- Label Categories

  | Element        | Name                  |
  | -------------- | --------------------- |
  | Text           | Main Text (Paragraph) |
  | Title          | Title                 |
  | Figure         | Image                 |
  | Figure caption | Image Caption         |
  | Table          | Table                 |
  | Table caption  | Table Caption         |
  | Header         | Header                |
  | Footer         | Footer                |
  | Toc            | Table of Contents     |

- Example

<div align="center">
    <img src="./case/report/1.jpg" width="50%" height="50%">
    <img src="./case/report/2.jpg" width="50%" height="50%">
</div>

## License

This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the [Apache license 2.0](./LICENSE.txt).

## License

The source code of this repository follows the open-source license Apache 2.0. The 360LayoutAnalysis model open-source model supports commercial use. If you need to use this model and its derivative models for commercial purposes, please apply through the email ([360ailab-nlp@360.cn](mailto:360ailab-nlp@360.cn)), and see the specific license agreement in ["360LayoutAnalysis Model Open Source Model License"](./360LayoutAnalysis开源模型许可证.txt).