hnamt commited on
Commit
19426cf
·
verified ·
1 Parent(s): b722f8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ library_name: pytorch
5
+ pipeline_tag: object-detection
6
+ tags:
7
+ - rtdetr
8
+ - object-detection
9
+
10
+ ---
11
+ license: apache-2.0
12
+ language: en
13
+ library_name: pytorch
14
+ pipeline_tag: object-detection
15
+ tags:
16
+ - rtdetr
17
+ - object-detection
18
+ - knowledge-distillation
19
+ - taco-dataset
20
+ - dinov3
21
+ - vit
22
+ ---
23
+
24
+ # RT-DisDINOv3-ViT: A Distilled RT-DETR-L Model
25
+
26
+ This model is an **RT-DETR-L** whose backbone and encoder have been pre-trained using knowledge distillation from a powerful **DINOv3 ViT-Base** teacher model. The distillation process was performed on feature maps from the [TACO (Trash Annotations in Context)](https://tacodataset.org/) dataset using the `lightly-train` framework.
27
+
28
+ This pre-trained checkpoint contains the "distilled knowledge" and is intended to be used as a starting point for fine-tuning on downstream object detection tasks.
29
+
30
+ This work is part of the **RT-DisDINOv3** project. For full details on the training pipeline, baseline comparisons, and analysis, please visit the [main GitHub repository](https://github.com/your-username/your-repo-name). <!--- <<< TODO: Add your GitHub repo link here -->
31
+
32
+ ## How to Use
33
+
34
+ You can load these distilled weights and apply them to the original RT-DETR-L model's backbone and encoder before fine-tuning.
35
+
36
+ ```python
37
+ import torch
38
+ from torch.hub import load_state_dict_from_url
39
+
40
+ # 1. Load the original RT-DETR-L model architecture
41
+ # Make sure you have the 'rtdetr' repository cloned locally or installed
42
+ rtdetr_l = torch.hub.load('lyuwenyu/RT-DETR', 'rtdetrv2_l', pretrained=True)
43
+ model = rtdetr_l.model
44
+
45
+ # 2. Load the distilled weights from this Hugging Face Hub repository
46
+ MODEL_URL = "https://huggingface.co/hnamt/RT-DisDINOv3-ViT-Base/resolve/main/distilled_rtdetr_vit_teacher_BEST.pth"
47
+ distilled_state_dict = load_state_dict_from_url(MODEL_URL, map_location='cpu')['model']
48
+
49
+ # 3. Load the weights into the model's backbone and encoder
50
+ # The `strict=False` flag ensures that only matching keys (backbone + encoder) are loaded.
51
+ model.load_state_dict(distilled_state_dict, strict=False)
52
+
53
+ print("Successfully loaded and applied distilled knowledge from ViT teacher!")
54
+
55
+ # Now the 'model' is ready for fine-tuning on your own dataset.
56
+ # For example:
57
+ # optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
58
+ # model.train()
59
+ # ... your fine-tuning loop ...
60
+ ```
61
+
62
+ ## Training Details
63
+
64
+ - **Student Model**: RT-DETR-L (`rtdetrv2_l` from [lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)).
65
+ - **Teacher Model**: DINOv3 ViT-Base (`dinov3/vitb16` via Lightly).
66
+ - **Dataset for Distillation**: TACO dataset images.
67
+ - **Distillation Procedure**: The student model's backbone and encoder were trained to minimize the Mean Squared Error (MSE) between their output feature maps and those of the teacher model, orchestrated by the `lightly-train` library.
68
+
69
+ ## Evaluation Results
70
+
71
+ After the distillation pre-training, the model was fine-tuned on the TACO dataset. In our experiments, this particular teacher did not yield an improvement over the baseline.
72
+
73
+ | Model | mAP@50-95 | mAP@50 | Speed (ms) | Notes |
74
+ | ----------------------------- | :-------: | :----: | :--------: | ----------------------------------- |
75
+ | RT-DETR-L (Baseline) | 2.80% | 4.60% | 50.05 | Fine-tuned from COCO pre-trained. |
76
+ | **RT-DisDINOv3 (w/ ViT)** | 2.80% | 4.20% | 49.80 | No performance improvement observed.|
77
+
78
+ ## License
79
+ The weights in this repository are released under the Apache 2.0 License. Please be aware that the models used for training (RT-DETR, DINOv3) have their own licenses.