---
library_name: transformers
license: apache-2.0
datasets:
- ds4sd/DocLayNet
pipeline_tag: image-segmentation
---

# Model Card for Model ID

We present the model cmarkea/detr-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document.
This is a fine-tuning of the model [detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet)
dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an
ODQA system.

## Model Details

### Model Description


### Direct Use

```python
from transformers import AutoImageProcessor
from transformers.models.detr import DetrForSegmentation

img_proc = AutoImageProcessor.from_pretrained(
    "ArkeaIAF/detr-layout-detection"
)
model = DetrForSegmentation.from_pretrained(
    "ArkeaIAF/detr-layout-detection"
)

with torch.inference_mode():
    input_ids = img_proc(img, return_tensors='pt')
    output = model(**input_ids)

threshold=0.4

segmentation_mask = img_proc.post_process_segmentation(
    out_seg,
    threshold=threshold,
    target_sizes=[img.size[::-1]]
)

bbox_pred = img_proc.post_process_object_detection(
    output,
    threshold=threshold,
    target_sizes=[img.size[::-1]]
)
```

### Citation

```
@online{DeDetrLay,
  AUTHOR = {Cyrile Delestre},
  URL = {https://huggingface.co/cmarkea/detr-base-layout-detection},
  YEAR = {2024},
  KEYWORDS = {Image Processing ; Transformers ; Layout},
}
```