Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,79 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language: en
|
| 4 |
+
library_name: pytorch
|
| 5 |
+
pipeline_tag: object-detection
|
| 6 |
+
tags:
|
| 7 |
+
- rtdetr
|
| 8 |
+
- object-detection
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
license: apache-2.0
|
| 12 |
+
language: en
|
| 13 |
+
library_name: pytorch
|
| 14 |
+
pipeline_tag: object-detection
|
| 15 |
+
tags:
|
| 16 |
+
- rtdetr
|
| 17 |
+
- object-detection
|
| 18 |
+
- knowledge-distillation
|
| 19 |
+
- taco-dataset
|
| 20 |
+
- dinov3
|
| 21 |
+
- vit
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
# RT-DisDINOv3-ViT: A Distilled RT-DETR-L Model
|
| 25 |
+
|
| 26 |
+
This model is an **RT-DETR-L** whose backbone and encoder have been pre-trained using knowledge distillation from a powerful **DINOv3 ViT-Base** teacher model. The distillation process was performed on feature maps from the [TACO (Trash Annotations in Context)](https://tacodataset.org/) dataset using the `lightly-train` framework.
|
| 27 |
+
|
| 28 |
+
This pre-trained checkpoint contains the "distilled knowledge" and is intended to be used as a starting point for fine-tuning on downstream object detection tasks.
|
| 29 |
+
|
| 30 |
+
This work is part of the **RT-DisDINOv3** project. For full details on the training pipeline, baseline comparisons, and analysis, please visit the [main GitHub repository](https://github.com/your-username/your-repo-name). <!--- <<< TODO: Add your GitHub repo link here -->
|
| 31 |
+
|
| 32 |
+
## How to Use
|
| 33 |
+
|
| 34 |
+
You can load these distilled weights and apply them to the original RT-DETR-L model's backbone and encoder before fine-tuning.
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
import torch
|
| 38 |
+
from torch.hub import load_state_dict_from_url
|
| 39 |
+
|
| 40 |
+
# 1. Load the original RT-DETR-L model architecture
|
| 41 |
+
# Make sure you have the 'rtdetr' repository cloned locally or installed
|
| 42 |
+
rtdetr_l = torch.hub.load('lyuwenyu/RT-DETR', 'rtdetrv2_l', pretrained=True)
|
| 43 |
+
model = rtdetr_l.model
|
| 44 |
+
|
| 45 |
+
# 2. Load the distilled weights from this Hugging Face Hub repository
|
| 46 |
+
MODEL_URL = "https://huggingface.co/hnamt/RT-DisDINOv3-ViT-Base/resolve/main/distilled_rtdetr_vit_teacher_BEST.pth"
|
| 47 |
+
distilled_state_dict = load_state_dict_from_url(MODEL_URL, map_location='cpu')['model']
|
| 48 |
+
|
| 49 |
+
# 3. Load the weights into the model's backbone and encoder
|
| 50 |
+
# The `strict=False` flag ensures that only matching keys (backbone + encoder) are loaded.
|
| 51 |
+
model.load_state_dict(distilled_state_dict, strict=False)
|
| 52 |
+
|
| 53 |
+
print("Successfully loaded and applied distilled knowledge from ViT teacher!")
|
| 54 |
+
|
| 55 |
+
# Now the 'model' is ready for fine-tuning on your own dataset.
|
| 56 |
+
# For example:
|
| 57 |
+
# optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
|
| 58 |
+
# model.train()
|
| 59 |
+
# ... your fine-tuning loop ...
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Training Details
|
| 63 |
+
|
| 64 |
+
- **Student Model**: RT-DETR-L (`rtdetrv2_l` from [lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)).
|
| 65 |
+
- **Teacher Model**: DINOv3 ViT-Base (`dinov3/vitb16` via Lightly).
|
| 66 |
+
- **Dataset for Distillation**: TACO dataset images.
|
| 67 |
+
- **Distillation Procedure**: The student model's backbone and encoder were trained to minimize the Mean Squared Error (MSE) between their output feature maps and those of the teacher model, orchestrated by the `lightly-train` library.
|
| 68 |
+
|
| 69 |
+
## Evaluation Results
|
| 70 |
+
|
| 71 |
+
After the distillation pre-training, the model was fine-tuned on the TACO dataset. In our experiments, this particular teacher did not yield an improvement over the baseline.
|
| 72 |
+
|
| 73 |
+
| Model | mAP@50-95 | mAP@50 | Speed (ms) | Notes |
|
| 74 |
+
| ----------------------------- | :-------: | :----: | :--------: | ----------------------------------- |
|
| 75 |
+
| RT-DETR-L (Baseline) | 2.80% | 4.60% | 50.05 | Fine-tuned from COCO pre-trained. |
|
| 76 |
+
| **RT-DisDINOv3 (w/ ViT)** | 2.80% | 4.20% | 49.80 | No performance improvement observed.|
|
| 77 |
+
|
| 78 |
+
## License
|
| 79 |
+
The weights in this repository are released under the Apache 2.0 License. Please be aware that the models used for training (RT-DETR, DINOv3) have their own licenses.
|