rf-detr-mobile-gui-detection

Mobile GUI grounding model built on top of roboflow/rf-detr-medium

Object Detection DETR Mobile GUI Grounding

rf-detr-mobile-gui-detection is a mobile gui grounding model built on top of roboflow/rf-detr-medium using the rfdetrforobjectdetection architecture. rf-detr is an end-to-end object detection model that combines ideas from lw-detr and deformable detr: a dinov2-with-registers-style vit backbone, an rf-detr windowing pattern for efficient attention, a multi-scale projector between the encoder and decoder, and a multi-scale deformable detr decoder for fast convergence and strong accuracy-latency tradeoffs.

Note

rf-detr: neural architecture search for real-time detection transformers: https://huggingface.co/papers/2511.09554

Metrics Loss Map

metrics_loss_map

Per Class Metrics

per_class_metrics

Quick Start with Transformers

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install torchvision==0.23.0 transformers==5.9.0 accelerate gradio==6.19.0
import gradio as gr
import torch
from PIL import Image, ImageDraw

from transformers import AutoImageProcessor, RfDetrForObjectDetection

# Load model and processor
model_name = "prithivMLmods/rf-detr-mobile-gui-detection"

processor = AutoImageProcessor.from_pretrained(model_name)
model = RfDetrForObjectDetection.from_pretrained(model_name)

# Detection threshold
THRESHOLD = 0.35


def detect_gui(image):
    image = Image.fromarray(image).convert("RGB")

    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)

    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(
        outputs,
        target_sizes=target_sizes,
        threshold=THRESHOLD,
    )[0]

    draw = ImageDraw.Draw(image)

    detections = []

    for score, label, box in zip(
        results["scores"],
        results["labels"],
        results["boxes"],
    ):
        box = [round(x, 2) for x in box.tolist()]
        label_name = model.config.id2label[label.item()]
        confidence = round(score.item(), 3)

        # Draw bounding box
        draw.rectangle(box, outline="red", width=3)

        # Draw label
        draw.text(
            (box[0] + 4, max(0, box[1] - 16)),
            f"{label_name} {confidence:.2f}",
            fill="red",
        )

        detections.append(
            {
                "Label": label_name,
                "Confidence": confidence,
                "Bounding Box": box,
            }
        )

    return image, detections


demo = gr.Interface(
    fn=detect_gui,
    inputs=gr.Image(type="numpy", label="Upload Mobile UI Screenshot"),
    outputs=[
        gr.Image(type="pil", label="Detected GUI Elements"),
        gr.JSON(label="Detections"),
    ],
    title="RF-DETR Mobile GUI Detection",
    description="Upload a mobile UI screenshot to detect GUI elements using RF-DETR.",
)

if __name__ == "__main__":
    demo.launch()

e.g., demo screenshot

screencapture-c959e285e9de4018e9-gradio-live-2026-06-28-21_22_41

Acknowledgements

  • roboflow/rf-detr-medium: rf-detr is an end-to-end object detection model that combines ideas from lw-detr and deformable detr: a dinov2-with-registers-style vit backbone (with an rf-detr windowing pattern for efficient attention), a multi-scale projector between the encoder and decoder, and a multi-scale deformable detr decoder for fast convergence and strong accuracy-latency tradeoffs.

  • mobile ui design detection[dataset] by mrtoy: this dataset is designed for object detection tasks focused on detecting elements in mobile ui designs. the target objects include text, images, and groups. the dataset contains mobile ui images with object detection bounding boxes, class labels, and localization information.

Downloads last month
-
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/rf-detr-mobile-gui-detection

Finetuned
(8)
this model

Dataset used to train prithivMLmods/rf-detr-mobile-gui-detection

Paper for prithivMLmods/rf-detr-mobile-gui-detection