DiT for object detection

#1
by SaraAmd - opened

Could you please show a demo or an example of how to use this model on object detection tasks? I need this model for this task on my own custom dataset but the code in their repository throws an error. And I hope at least it can be used in hugging face library

Can you share the error?

Provide more information

DiT model card (https://huggingface.co/docs/transformers/model_doc/dit) refers to 3 use cases: image classification, layout analysis and table detection.
However, the resources section in the model card contains only a notebook with an image classification working example.
In addition, the only code snippet from dit-base model card (captioned below), returns only the logits, other than demonstrating the complete pipeline for each of the use cases.
Getting a working example of each use case complete pipeline will be very helpful.
Thank you in advance.

import torch
from PIL import Image

image = Image.open('path_to_your_document_image').convert('RGB')

processor = BeitImageProcessor.from_pretrained("microsoft/dit-base")
model = BeitForMaskedImageModeling.from_pretrained("microsoft/dit-base")

num_patches = (model.config.image_size // model.config.patch_size) ** 2
pixel_values = processor(images=image, return_tensors="pt").pixel_values
# create random boolean mask of shape (batch_size, num_patches)
bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool()

outputs = model(pixel_values, bool_masked_pos=bool_masked_pos)
loss, logits = outputs.loss, outputs.logits```

Sign up or log in to comment