# Model Zoo ## Common Settings - All COCO models were trained on `coco_2017_train` and evaluated on `coco_2017_val`. - All models were trained using distributed training. - Most models were trained with 50 epochs settings (~51 COCO epochs) with multi-step LR scheduler which is the common setting in DETR-like methods. ## COCO Object Detection Baselines Here we provides our pretrained baselines with **detrex**. And more pretrained weights will be released in the future version. We also provide our converted pretrained weights for the users which will be marked as `(converted)`. ### DETR
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
DETR-R50 (converted) | R-50 | IN1k | 500 | 42.0 | model |
DETR-R50-DC5 (converted) | R-50 | IN1k | 500 | 43.4 | model |
DETR-R101 (converted) | R-101 | IN1k | 500 | 43.5 | model |
DETR-R101-DC5 (converted) | R-101 | IN1k | 500 | 44.9 | model |
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
Deformable-DETR + Box Refinement | R50 | IN1k | 50 | 47.0 | model |
Deformable-DETR + Box Refinement + Two Stage | R50 | IN1k | 50 | 48.2 | model |
Name | Backbone | Pretrain | Epochs | box AP |
download |
---|---|---|---|---|---|
Anchor-DETR-R50 | R-50 | IN1k | 50 | 41.9 | model |
Anchor-DETR-R50 (converted) | R-50 | IN1k | 50 | 42.2 | model |
Anchor-DETR-R50-DC5 (converted) | R-50 | IN1k | 50 | 44.2 | model |
Anchor-DETR-R101 (converted) | R-101 | IN1k | 50 | 43.5 | model |
Anchor-DETR-R101-DC5 (converted) | R-101 | IN1k | 50 | 45.1 | model |
Name | Backbone | Pretrain | Epochs | box AP |
download |
---|---|---|---|---|---|
Conditional-DETR-R50 | R-50 | IN1k | 50 | 41.6 | model |
Conditional-DETR-R50-DC5 (converted) | R-50-DC5 | IN1k | 50 | 43.8 | model |
Conditional-DETR-R101 (converted) | R-101 | IN1k | 50 | 43.0 | model |
Conditional-DETR-R101-DC5 (converted) | R-101-DC5 | IN1k | 50 | 45.1 | model |
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
DAB-DETR-R50 | R50 | IN1k | 50 | 43.3 | model |
DAB-DETR-R50-3patterns (converted) | R-50 | IN1k | 50 | 42.8 | model |
DAB-DETR-R50-DC5 (converted) | R-50 | IN1k | 50 | 44.6 | model |
DAB-DETR-R50-DC5-3patterns (converted) | R-50 | IN1k | 50 | 45.7 | model |
DAB-DETR-R101 | R101 | IN1k | 50 | 44.0 | model |
DAB-DETR-R101-DC5 (converted) | R-101 | IN1k | 50 | 45.7 | model |
DAB-DETR-Swin-T | Swin-Tiny-224 | IN1k | 50 | 45.2 | model |
DAB-Deformable-DETR-R50 | R50 | IN1k | 50 | 49.0 | model |
DAB-Deformable-DETR-R50-Two-Stage | R50 | IN1k | 50 | 49.7 | model |
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
DN-DETR-R50 | R50 | IN1k | 50 | 44.7 | model |
DN-DETR-R50-DC5 (converted) | R50 | IN1k | 50 | 46.3 | model |
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-R50-4scale | R50 | IN1k | 12 | 100 | 49.2 | model |
DINO-R50-4scale (hacked trainer) | R-50 | IN1k | 12 | 100 | 49.4 | model |
DINO-R50-4scale with EMA | R-50 | IN1k | 12 | 100 | 49.4 | model |
DINO-R50-5scale | R50 | IN1k | 12 | 100 | 49.6 | model |
DINO-R50-4scale | R50 | IN1k | 12 | 300 | 49.5 | model |
DINO-R50-4scale | R50 | IN1k | 24 | 100 | 50.6 | model |
DINO-R101-4scale | R101 | IN1k | 12 | 100 | 50.0 | model |
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-Swin-T-224-4scale | Swin-Tiny-224 | IN1k | 12 | 100 | 51.3 | model |
DINO-Swin-T-224-4scale | Swin-Tiny-224 | IN22k to IN1k | 12 | 100 | 52.5 | model |
DINO-Swin-S-224-4scale | Swin-Small-224 | IN1k | 12 | 100 | 53.0 | model |
DINO-Swin-B-384-4scale | Swin-Base-384 | IN22k to IN1k | 12 | 100 | 55.8 | model |
DINO-Swin-L-224-4scale | Swin-Large-224 | IN22k to IN1k | 12 | 100 | 56.9 | model |
DINO-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 12 | 100 | 56.9 | model |
DINO-Swin-L-384-5scale | Swin-Large-384 | IN22k to IN1k | 12 | 100 | 57.5 | model |
DINO-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 58.1 | model |
DINO-Swin-L-384-5scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 58.5 | model |
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-FocalNet-Large-4scale | FocalNet-384-LRF-3Level | IN22k | 12 | 100 | 57.5 | model |
DINO-FocalNet-Large-4scale | FocalNet-384-LRF-4Level | IN22k | 12 | 100 | 58.0 | model |
DINO-FocalNet-Large-5scale | FocalNet-384-LRF-4Level | IN22k | 12 | 100 | 58.5 | model |
Name | Backbone | Pretrained | Epochs | Denoising Queries | box AP |
Download |
---|---|---|---|---|---|---|
DINO-ViTDet-Base-4scale | ViT | IN1k, MAE | 12 | 100 | 50.2 | model |
DINO-ViTDet-Base-4scale | ViT | IN1k, MAE | 50 | 100 | 55.0 | model |
DINO-ViTDet-Large-4scale | ViT | IN1k, MAE | 12 | 100 | 52.9 | model |
DINO-ViTDet-Large-4scale | ViT | IN1k, MAE | 50 | 100 | 57.5 | model |
Name | Backbone | Pretrained | Query | Epochs | box AP |
Download |
---|---|---|---|---|---|---|
H-Deformable-DETR-R50 + tricks (detrex) | R50 | IN1k | 300 | 12 | 49.1 | model |
H-Deformable-DETR-R50 + tricks (converted) | R50 | IN1k | 300 | 12 | 48.9 | model |
H-Deformable-DETR-R50 + tricks (converted) | R50 | IN1k | 300 | 36 | 50.3 | model |
H-Deformable-DETR-Swin-T + tricks (converted) | Swin-Tiny | IN1k | 300 | 12 | 50.6 | model |
H-Deformable-DETR-Swin-T + tricks (converted) | Swin-Tiny | IN1k | 300 | 36 | 53.5 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 12 | 56.2 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 36 | 57.5 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 900 | 12 | 56.4 | model |
H-Deformable-DETR-Swin-L + tricks (converted) | Swin-Large | IN22k | 300 | 36 | 57.5 | model |
Name | Backbone | Pretrained | Epochs | box AP |
Download |
---|---|---|---|---|---|
Improved-Deformable-DETR-R50 (converted) | R-50 | IN1k | 50 | 49.8 | model |
DETA-R50-5scale (bs=8, 180000 iterations) | R-50 | IN1k | 12 | 50.0 | model |
DETA-R50-5scale (with hacked train engine) | R-50 | IN1k | 12 | 49.9 | model |
DETA-R50-5scale-12ep (no frozen backbone) | R-50 | IN1k | 12 | 50.2 | model |
DETA-R50-5scale (converted) | R-50 | IN1k | 12 | 50.1 | model |
DETA-Swin-Large-finetune (converted) | Swin-Large-384 | Object 365 | 24 | 62.9 | model |