--- library_name: transformers license: apache-2.0 datasets: - ds4sd/DocLayNet pipeline_tag: image-segmentation --- # Model Card for Model ID We present the model cmarkea/detr-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document. This is a fine-tuning of the model [detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet) dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an ODQA system. ## Model Details ### Model Description ### Direct Use ```python from transformers import AutoImageProcessor from transformers.models.detr import DetrForSegmentation img_proc = AutoImageProcessor.from_pretrained( "ArkeaIAF/detr-layout-detection" ) model = DetrForSegmentation.from_pretrained( "ArkeaIAF/detr-layout-detection" ) with torch.inference_mode(): input_ids = img_proc(img, return_tensors='pt') output = model(**input_ids) threshold=0.4 segmentation_mask = img_proc.post_process_segmentation( out_seg, threshold=threshold, target_sizes=[img.size[::-1]] ) bbox_pred = img_proc.post_process_object_detection( output, threshold=threshold, target_sizes=[img.size[::-1]] ) ``` ### Citation ``` @online{DeDetrLay, AUTHOR = {Cyrile Delestre}, URL = {https://huggingface.co/cmarkea/detr-base-layout-detection}, YEAR = {2024}, KEYWORDS = {Image Processing ; Transformers ; Layout}, } ```