RT-DETRv2 Trashify Box Detector

This model is a fine-tuned version of PekingU/rtdetr_v2_r50vd trained on the custom waste and environment object detection dataset mrdbourke/trashify_manual_labelled_images.

Model Details

Developed by: Rahul Kate
Model Type: Object Detection (Transformer-Based DETR)
Base Model: RT-DETRv2 (ResNet-50 backbone)
Language(s): Python (PyTorch / Hugging Face Transformers)
Finetuning Date: June 2026

Intended Use

Primary Use Case

The model is designed to detect trash items, disposal bins, and hands or robotic arms interacting with waste in real-time or from static imagery. It is intended for integration into automated waste sorting facilities, smart recycling infrastructure, or robotic pickup pipelines.

Out-of-Scope Use Cases

This model should not be used in critical safety-first applications without human verification or generalized open-world object detection outside its predefined classes.

Factors & Class Vocabulary

The model tracks 7 target classes specialized for waste interaction:

bin
not_bin
hand
not_hand
trash
not_trash
trash_arm

Metrics & Performance Summary

The model was evaluated after completing 10 full training epochs using torchmetrics.detection.mean_ap.MeanAveragePrecision.

Overall Performance

mAP (Mean Average Precision @ [0.5:0.95]): 0.8721 (87.21%)
mAP @ 50 (IoU threshold 0.50): 0.9976 (99.76%)
mAP @ 75 (IoU threshold 0.75): 0.9629 (96.29%)

Scale-Based Precision

mAP Small: 0.4000
mAP Medium: 0.8557
mAP Large: 0.8854

Per-Class Evaluation Breakdown

Class Name	mAP (Mean Average Precision)	mAR @ 100 (Mean Average Recall)
`bin`	`0.9146`	`0.9576`
`hand`	`0.8838`	`0.9190`
`not_bin`	`0.7510`	`0.7833`
`not_hand`	`-1.0000` *	`-1.0000` *
`not_trash`	`0.7627`	`0.7951`
`trash`	`0.9204`	`0.9400`
`trash_arm`	`1.0000`	`1.0000`

* Note on -1.0000 metrics: The random evaluation dataset split lacked representative ground-truth instances for the not_hand class during the evaluation pass, yielding an expected placeholder value. This does not indicate model failure, but a data split constraint.

Training Hyperparameters & Logistics

Dataset Split: 1,128 Train rows | 46 Validation rows | 180 Test rows
Batch Size: 8 (per device)
Epochs: 10
Learning Rate (Head/Other Modules): 1e-4 (Optimized via CustomTrainer split)
Learning Rate (Backbone): 1e-5
Weight Decay: 1e-4
Optimizer: AdamW
Mixed Precision: FP16 Enabled
Gradient Clipping Max Norm: 0.1

How to Use

from transformers import AutoImageProcessor, AutoModelForObjectDetection
import torch

# Load model and processor directly from Hugging Face Hub
model_id = "RahulKate-173/rt_detrv2_finetuned_trashify_box_detector_v2"
processor = AutoImageProcessor.from_pretrained(model_id)
model = AutoModelForObjectDetection.from_pretrained(model_id).to("cuda")

# Inference setup
inputs = processor(images=your_image, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)

# Post-process predictions
results = processor.post_process_object_detection(
    outputs, 
    threshold=0.3, 
    target_sizes=[your_image.size[::-1]]
)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(f"Detected {model.config.id2label[label.item()]} with confidence {score.item():.2f}")

Downloads last month: 37

Safetensors

Model size

42.9M params

Tensor type

F32

RahulKate-173
/

rt_detrv2_finetuned_trashify_box_detector_v2