RT-DETRv2 Trashify Box Detector

This model is a fine-tuned version of PekingU/rtdetr_v2_r50vd trained on the custom waste and environment object detection dataset mrdbourke/trashify_manual_labelled_images.

Model Details

  • Developed by: Rahul Kate
  • Model Type: Object Detection (Transformer-Based DETR)
  • Base Model: RT-DETRv2 (ResNet-50 backbone)
  • Language(s): Python (PyTorch / Hugging Face Transformers)
  • Finetuning Date: June 2026

Intended Use

Primary Use Case

The model is designed to detect trash items, disposal bins, and hands or robotic arms interacting with waste in real-time or from static imagery. It is intended for integration into automated waste sorting facilities, smart recycling infrastructure, or robotic pickup pipelines.

Out-of-Scope Use Cases

This model should not be used in critical safety-first applications without human verification or generalized open-world object detection outside its predefined classes.


Factors & Class Vocabulary

The model tracks 7 target classes specialized for waste interaction:

  1. bin
  2. not_bin
  3. hand
  4. not_hand
  5. trash
  6. not_trash
  7. trash_arm

Metrics & Performance Summary

The model was evaluated after completing 10 full training epochs using torchmetrics.detection.mean_ap.MeanAveragePrecision.

Overall Performance

  • mAP (Mean Average Precision @ [0.5:0.95]): 0.8721 (87.21%)
  • mAP @ 50 (IoU threshold 0.50): 0.9976 (99.76%)
  • mAP @ 75 (IoU threshold 0.75): 0.9629 (96.29%)

Scale-Based Precision

  • mAP Small: 0.4000
  • mAP Medium: 0.8557
  • mAP Large: 0.8854

Per-Class Evaluation Breakdown

Class Name mAP (Mean Average Precision) mAR @ 100 (Mean Average Recall)
bin 0.9146 0.9576
hand 0.8838 0.9190
not_bin 0.7510 0.7833
not_hand -1.0000 * -1.0000 *
not_trash 0.7627 0.7951
trash 0.9204 0.9400
trash_arm 1.0000 1.0000

* Note on -1.0000 metrics: The random evaluation dataset split lacked representative ground-truth instances for the not_hand class during the evaluation pass, yielding an expected placeholder value. This does not indicate model failure, but a data split constraint.


Training Hyperparameters & Logistics

  • Dataset Split: 1,128 Train rows | 46 Validation rows | 180 Test rows
  • Batch Size: 8 (per device)
  • Epochs: 10
  • Learning Rate (Head/Other Modules): 1e-4 (Optimized via CustomTrainer split)
  • Learning Rate (Backbone): 1e-5
  • Weight Decay: 1e-4
  • Optimizer: AdamW
  • Mixed Precision: FP16 Enabled
  • Gradient Clipping Max Norm: 0.1

How to Use

from transformers import AutoImageProcessor, AutoModelForObjectDetection
import torch

# Load model and processor directly from Hugging Face Hub
model_id = "RahulKate-173/rt_detrv2_finetuned_trashify_box_detector_v2"
processor = AutoImageProcessor.from_pretrained(model_id)
model = AutoModelForObjectDetection.from_pretrained(model_id).to("cuda")

# Inference setup
inputs = processor(images=your_image, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)

# Post-process predictions
results = processor.post_process_object_detection(
    outputs, 
    threshold=0.3, 
    target_sizes=[your_image.size[::-1]]
)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    print(f"Detected {model.config.id2label[label.item()]} with confidence {score.item():.2f}")
Downloads last month
37
Safetensors
Model size
42.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train RahulKate-173/rt_detrv2_finetuned_trashify_box_detector_v2