Spaces:
Runtime error
Runtime error
Model Zoo
Common Settings
- All COCO models were trained on
coco_2017_train
and evaluated oncoco_2017_val
. - All models were trained using distributed training.
- Most models were trained with 50 epochs settings (~51 COCO epochs) with multi-step LR scheduler which is the common setting in DETR-like methods.
COCO Object Detection Baselines
Here we provides our pretrained baselines with detrex. And more pretrained weights will be released in the future version. We also provide our converted pretrained weights for the users which will be marked as (converted)
.
DETR
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
DETR-R50 (converted) | R-50 | IN1k | 500 | 42.0 | model |
DETR-R50-DC5 (converted) | R-50 | IN1k | 500 | 43.4 | model |
DETR-R101 (converted) | R-101 | IN1k | 500 | 43.5 | model |
DETR-R101-DC5 (converted) | R-101 | IN1k | 500 | 44.9 | model |
Deformable-DETR
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
Deformable-DETR + Box Refinement | R50 | IN1k | 50 | 47.0 | model |
Deformable-DETR + Box Refinement + Two Stage | R50 | IN1k | 50 | 48.2 | model |
Anchor-DETR
Name | Backbone | Pretrain | Epochs | box AP |
download |
---|---|---|---|---|---|
Anchor-DETR-R50 | R-50 | IN1k | 50 | 41.9 | model |
Anchor-DETR-R50 (converted) | R-50 | IN1k | 50 | 42.2 | model |
Anchor-DETR-R50-DC5 (converted) | R-50 | IN1k | 50 | 44.2 | model |
Anchor-DETR-R101 (converted) | R-101 | IN1k | 50 | 43.5 | model |
Anchor-DETR-R101-DC5 (converted) | R-101 | IN1k | 50 | 45.1 | model |
Conditional-DETR
Name | Backbone | Pretrain | Epochs | box AP |
download |
---|---|---|---|---|---|
Conditional-DETR-R50 | R-50 | IN1k | 50 | 41.6 | model |
Conditional-DETR-R50-DC5 (converted) | R-50-DC5 | IN1k | 50 | 43.8 | model |
Conditional-DETR-R101 (converted) | R-101 | IN1k | 50 | 43.0 | model |
Conditional-DETR-R101-DC5 (converted) | R-101-DC5 | IN1k | 50 | 45.1 | model |
DAB-DETR
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
DAB-DETR-R50 | R50 | IN1k | 50 | 43.3 | model |
DAB-DETR-R50-3patterns (converted) | R-50 | IN1k | 50 | 42.8 | model |
DAB-DETR-R50-DC5 (converted) | R-50 | IN1k | 50 | 44.6 | model |
DAB-DETR-R50-DC5-3patterns (converted) | R-50 | IN1k | 50 | 45.7 | model |
DAB-DETR-R101 | R101 | IN1k | 50 | 44.0 | model |
DAB-DETR-R101-DC5 (converted) | R-101 | IN1k | 50 | 45.7 | model |
DAB-DETR-Swin-T | Swin-Tiny-224 | IN1k | 50 | 45.2 | model |
DAB-Deformable-DETR-R50 | R50 | IN1k | 50 | 49.0 | model |
DAB-Deformable-DETR-R50-Two-Stage | R50 | IN1k | 50 | 49.7 | model |
DN-DETR
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
DN-DETR-R50 | R50 | IN1k | 50 | 44.7 | model |
DN-DETR-R50-DC5 (converted) | R50 | IN1k | 50 | 46.3 | model |
DINO
Pretrained DINO with ResNet Backbone
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-R50-4scale | R50 | IN1k | 12 | 100 | 49.2 | model |
DINO-R50-4scale (hacked trainer) | R-50 | IN1k | 12 | 100 | 49.4 | model |
DINO-R50-4scale with EMA | R-50 | IN1k | 12 | 100 | 49.4 | model |
DINO-R50-5scale | R50 | IN1k | 12 | 100 | 49.6 | model |
DINO-R50-4scale | R50 | IN1k | 12 | 300 | 49.5 | model |
DINO-R50-4scale | R50 | IN1k | 24 | 100 | 50.6 | model |
DINO-R101-4scale | R101 | IN1k | 12 | 100 | 50.0 | model |
Pretrained DINO with Swin-Transformer Backbone
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-Swin-T-224-4scale | Swin-Tiny-224 | IN1k | 12 | 100 | 51.3 | model |
DINO-Swin-T-224-4scale | Swin-Tiny-224 | IN22k to IN1k | 12 | 100 | 52.5 | model |
DINO-Swin-S-224-4scale | Swin-Small-224 | IN1k | 12 | 100 | 53.0 | model |
DINO-Swin-B-384-4scale | Swin-Base-384 | IN22k to IN1k | 12 | 100 | 55.8 | model |
DINO-Swin-L-224-4scale | Swin-Large-224 | IN22k to IN1k | 12 | 100 | 56.9 | model |
DINO-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 12 | 100 | 56.9 | model |
DINO-Swin-L-384-5scale | Swin-Large-384 | IN22k to IN1k | 12 | 100 | 57.5 | model |
DINO-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 58.1 | model |
DINO-Swin-L-384-5scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 58.5 | model |
Pretrained DINO with FocalNet Backbone
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-FocalNet-Large-4scale | FocalNet-384-LRF-3Level | IN22k | 12 | 100 | 57.5 | model |
DINO-FocalNet-Large-4scale | FocalNet-384-LRF-4Level | IN22k | 12 | 100 | 58.0 | model |
DINO-FocalNet-Large-5scale | FocalNet-384-LRF-4Level | IN22k | 12 | 100 | 58.5 | model |
Pretrained DINO with ViTDet Backbone
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-ViTDet-Base-4scale | ViT | IN1k, MAE | 12 | 100 | 50.2 | model |
DINO-ViTDet-Base-4scale | ViT | IN1k, MAE | 50 | 100 | 55.0 | model |
DINO-ViTDet-Large-4scale | ViT | IN1k, MAE | 12 | 100 | 52.9 | model |
DINO-ViTDet-Large-4scale | ViT | IN1k, MAE | 50 | 100 | 57.5 | model |
H-Deformable-DETR
Name | Backbone | Pretrained | Query | Epochs | box AP |
Download |
---|---|---|---|---|---|---|
H-Deformable-DETR-R50 + tricks (detrex) | R50 | IN1k | 300 | 12 | 49.1 | model |
H-Deformable-DETR-R50 + tricks (converted) | R50 | IN1k | 300 | 12 | 48.9 | model |
H-Deformable-DETR-R50 + tricks (converted) | R50 | IN1k | 300 | 36 | 50.3 | model |
H-Deformable-DETR-Swin-T + tricks (converted) | Swin-Tiny | IN1k | 300 | 12 | 50.6 | model |
H-Deformable-DETR-Swin-T + tricks (converted) | Swin-Tiny | IN1k | 300 | 36 | 53.5 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 12 | 56.2 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 36 | 57.5 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 900 | 12 | 56.4 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 36 | 57.5 | model |
DETA
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
Improved-Deformable-DETR-R50 (converted) | R-50 | IN1k | 50 | 49.8 | model |
DETA-R50-5scale (bs=8, 180000 iterations) | R-50 | IN1k | 12 | 50.0 | model |
DETA-R50-5scale (with hacked train engine) | R-50 | IN1k | 12 | 49.9 | model |
DETA-R50-5scale-12ep (no frozen backbone) | R-50 | IN1k | 12 | 50.2 | model |
DETA-R50-5scale (converted) | R-50 | IN1k | 12 | 50.1 | model |
DETA-Swin-Large-finetune (converted) | Swin-Large-384 | Object 365 | 24 | 62.9 | model |