| { | |
| "model_id": "Aryn/deformable-detr-DocLayNet", | |
| "downloads": 17314, | |
| "tags": [ | |
| "transformers", | |
| "safetensors", | |
| "deformable_detr", | |
| "object-detection", | |
| "vision", | |
| "dataset:DocLayNet", | |
| "arxiv:2206.01062", | |
| "arxiv:2010.04159", | |
| "license:apache-2.0", | |
| "endpoints_compatible", | |
| "region:us" | |
| ], | |
| "description": "--- license: apache-2.0 tags: - object-detection - vision datasets: - DocLayNet widget: - src: example_title: DocLayNet Example 1 - src: example_title: DocLayNet Example 2 - src: example_title: DocLayNet Example 3 --- # Deformable DETR model trained on DocLayNet Deformable DEtection TRansformer (DETR), trained on DocLayNet (including 80k annotated pages in 11 classes). You can use this model in the serverless Aryn Partitioning Service. You can get started here ## Model description The DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100. The model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model. !model image ## Intended uses & limitations You can use the raw model for object detection. See the model hub to look for all available Deformable DETR models. ### How to use Here is how to use this model: ## Evaluation results This model achieves 57.1 box mAP on DocLayNet. ## Training data The Deformable DETR model was trained on DocLayNet. It was introduced in the paper DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis by Pfitzmann et al. and first released in this repository. ### BibTeX entry and citation info", | |
| "model_explanation_gemini": "Detects objects in document layouts using a transformer-based model trained on DocLayNet, achieving 57.1 box mAP. \n\n**Features:** \n- Object detection for document layouts (11 classes) \n- Encoder-decoder transformer with convolutional backbone \n- Uses object queries and bipartite matching loss \n- Trained on DocLayNet (80k annotated pages) \n\n**Comparison:** \nUnlike standard DETR models, this variant uses deformable attention for improved efficiency and performance", | |
| "release_year": "2022", | |
| "parameter_count": null, | |
| "is_fine_tuned": false, | |
| "category": "Vision", | |
| "api_enhanced": true | |
| } |