Spaces:
Runtime error
Runtime error
Download Pretrained Backbone Weights
Here we collect the links of the backbone models which makes it easier for users to download pretrained weights for the builtin backbones. And this document will be kept updated. Most included models are borrowed from their original sources. Many thanks for their nicely work in the backbone area.
ResNet
We've already provided the tutorials of using torchvision pretrained ResNet models here: Download TorchVision ResNet Models.
Swin-Transformer
Here we borrowed the download links from the official implementation of Swin-Transformer.
Swin-Tiny
Name | Pretrain | Resolution | Acc@1 | Acc@5 | 22K Model | 1K Model |
---|---|---|---|---|---|---|
Swin-Tiny | ImageNet-1K | 224x224 | 81.2 | 95.5 | - | download |
Swin-Tiny | ImageNet-22K | 224x224 | 80.9 | 96.0 | download | download |
Using Swin-Tiny Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=96,
depths=(2, 2, 6, 2),
num_heads=(3, 6, 12, 24),
drop_path_rate=0.1,
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224_22kto1k_finetune.pth"
Swin-Small
Name | Pretrain | Resolution | Acc@1 | Acc@5 | 22K Model | 1K Model |
---|---|---|---|---|---|---|
Swin-Small | ImageNet-1K | 224x224 | 83.2 | 96.2 | - | download |
Swin-Small | ImageNet-22K | 224x224 | 83.2 | 97.0 | download | download |
Using Swin-Small Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=96,
depths=(2, 2, 18, 2),
num_heads=(3, 6, 12, 24),
drop_path_rate=0.2,
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_small_patch4_window7_224_22kto1k_finetune.pth"
Swin-Base
Name | Pretrain | Resolution | Acc@1 | Acc@5 | 22K Model | 1K Model |
---|---|---|---|---|---|---|
Swin-Base | ImageNet-1K | 224x224 | 83.5 | 96.5 | - | download |
Swin-Base | ImageNet-1K | 384x384 | 84.5 | 97.0 | - | download |
Swin-Base | ImageNet-22K | 224x224 | 85.2 | 97.5 | download | download |
Swin-Base | ImageNet-22K | 384x384 | 86.4 | 98.0 | download | download |
Using Swin-Base-224 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=128,
depths=(2, 2, 18, 2),
num_heads=(4, 8, 16, 32),
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window7_224_22kto1k.pth"
Using Swin-Base-384 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=384,
embed_dim=128,
depths=(2, 2, 18, 2),
num_heads=(4, 8, 16, 32),
window_size=12,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window12_384.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window12_384_22kto1k.pth"
Swin-Large
Name | Pretrain | Resolution | Acc@1 | Acc@5 | 22K Model | 1K Model |
---|---|---|---|---|---|---|
Swin-Large | ImageNet-22K | 224x224 | 86.3 | 97.9 | download | download |
Swin-Large | ImageNet-22K | 384x384 | 87.3 | 98.2 | download | download |
Using Swin-Large-224 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=224,
embed_dim=192,
depths=(2, 2, 18, 2),
num_heads=(6, 12, 24, 48),
window_size=7,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window7_224_22kto1k.pth"
Using Swin-Large-384 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer
# modify backbone config
model.backbone = L(SwinTransformer)(
pretrain_img_size=384,
embed_dim=192,
depths=(2, 2, 18, 2),
num_heads=(6, 12, 24, 48),
window_size=12,
out_indices=(1, 2, 3),
)
# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window12_384_22kto1k.pth"
ViTDet
Here we borrowed the download links from the official implementation of MAE.
ViT-Base | ViT-Large | ViT-Huge | |
---|---|---|---|
Pretrained Checkpoint | download | download | download |
Using ViTDet Backbone in Config
import torch.nn as nn
from detectron2.config import LazyCall as L
from detectron2.layers import ShapeSpec
from detectron2.modeling import ViT, SimpleFeaturePyramid
from detectron2.modeling.backbone.fpn import LastLevelMaxPool
from .dino_r50 import model
# ViT Base Hyper-params
embed_dim, depth, num_heads, dp = 768, 12, 12, 0.1
# Creates Simple Feature Pyramid from ViT backbone
model.backbone = L(SimpleFeaturePyramid)(
net=L(ViT)( # Single-scale ViT backbone
img_size=1024,
patch_size=16,
embed_dim=embed_dim,
depth=depth,
num_heads=num_heads,
drop_path_rate=dp,
window_size=14,
mlp_ratio=4,
qkv_bias=True,
norm_layer=partial(nn.LayerNorm, eps=1e-6),
window_block_indexes=[
# 2, 5, 8 11 for global attention
0,
1,
3,
4,
6,
7,
9,
10,
],
residual_block_indexes=[],
use_rel_pos=True,
out_feature="last_feat",
),
in_feature="${.net.out_feature}",
out_channels=256,
scale_factors=(2.0, 1.0, 0.5), # (4.0, 2.0, 1.0, 0.5) in ViTDet
top_block=L(LastLevelMaxPool)(),
norm="LN",
square_pad=1024,
)
# setup init checkpoint path
train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth"
Please refer to DINO project for more details about the usage of vit backbone.
FocalNet
Here we borrowed the download links from the official implementation of FocalNet.
Model | Depth | Dim | Kernels | #Params. (M) | Download |
---|---|---|---|---|---|
FocalNet-L | [2, 2, 18, 2] | 192 | [5, 7, 9] | 207 | download |
FocalNet-L | [2, 2, 18, 2] | 192 | [3, 5, 7, 9] | 207 | download |
FocalNet-XL | [2, 2, 18, 2] | 256 | [5, 7, 9] | 366 | download |
FocalNet-XL | [2, 2, 18, 2] | 256 | [3, 5, 7, 9] | 207 | download |
FocalNet-H | [2, 2, 18, 2] | 352 | [5, 7, 9] | 687 | download |
FocalNet-H | [2, 2, 18, 2] | 352 | [3, 5, 7, 9] | 687 | download |
Using FocalNet Backbone in Config
# focalnet-large-4scale baseline
model.backbone = L(FocalNet)(
embed_dim=192,
depths=(2, 2, 18, 2),
focal_levels=(3, 3, 3, 3),
focal_windows=(5, 5, 5, 5),
use_conv_embed=True,
use_postln=True,
use_postln_in_modulation=False,
use_layerscale=True,
normalize_modulator=False,
out_indices=(1, 2, 3),
)