2023-11-09 22:35:08,809 - mmseg - INFO - Multi-processing start method is `None`
2023-11-09 22:35:08,822 - mmseg - INFO - OpenCV num_threads is `128
2023-11-09 22:35:08,822 - mmseg - INFO - OMP num threads is 1
2023-11-09 22:35:08,907 - mmseg - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.15 (default, Nov  4 2022, 20:59:55) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
CUDA_HOME: /mnt/petrelfs/wangwenhai/miniconda3/envs/mmdetseg
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
PyTorch: 1.13.0
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.14.0
OpenCV: 4.8.0
MMCV: 1.7.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.7
MMSegmentation: 0.27.0+
------------------------------------------------------------

2023-11-09 22:35:08,907 - mmseg - INFO - Distributed training: True
2023-11-09 22:35:09,165 - mmseg - INFO - Config:
checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segmenter/vit_base_p16_384_20220308-96dfe169.pth'
backbone_norm_cfg = dict(type='LN', eps=1e-06, requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained=
    './pretrained/intern_vit_6b_224px.pth',
    backbone=dict(
        type='InternViT6B',
        pretrain_size=224,
        img_size=504,
        patch_size=14,
        embed_dim=3200,
        depth=48,
        num_heads=25,
        mlp_ratio=4.0,
        qkv_bias=False,
        drop_path_rate=0.4,
        init_values=0.1,
        with_cp=True,
        use_flash_attn=True,
        qk_normalization=True,
        layerscale_no_force_fp32=True,
        freeze_vit=False,
        out_indices=[47]),
    decode_head=dict(
        type='FCNHead',
        in_channels=3200,
        channels=3200,
        num_convs=0,
        dropout_ratio=0.0,
        concat_input=False,
        num_classes=150,
        with_norm=True,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    test_cfg=dict(mode='slide', crop_size=(504, 504), stride=(322, 322)))
dataset_type = 'ADE20KDataset'
data_root = 'data/ade/ADEChallengeData2016'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (504, 504)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(type='Resize', img_scale=(2016, 504), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(504, 504), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(504, 504), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2016, 504),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='ResizeToMultiple', size_divisor=14),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=4,
    train=dict(
        type='ADE20KDataset',
        data_root='data/ade/ADEChallengeData2016',
        img_dir='images/training',
        ann_dir='annotations/training',
        max_image_num=1263,
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', reduce_zero_label=True),
            dict(type='Resize', img_scale=(2016, 504), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(504, 504), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(504, 504), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='ADE20KDataset',
        data_root='data/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2016, 504),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='ResizeToMultiple', size_divisor=14),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='ADE20KDataset',
        data_root='data/ade/ADEChallengeData2016',
        img_dir='images/validation',
        ann_dir='annotations/validation',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2016, 504),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='ResizeToMultiple', size_divisor=14),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        dict(type='TensorboardLoggerHook')
    ])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(
    type='AdamW',
    lr=4e-05,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    constructor='CustomLayerDecayOptimizerConstructor',
    paramwise_cfg=dict(num_layers=48, layer_decay_rate=0.95))
optimizer_config = dict()
lr_config = dict(
    policy='poly',
    warmup='linear',
    warmup_iters=100,
    warmup_ratio=1e-06,
    power=1.0,
    min_lr=0.0,
    by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=5000)
checkpoint_config = dict(
    by_epoch=False, interval=1000, deepspeed=True, max_keep_ckpts=2)
evaluation = dict(
    interval=1000, metric='mIoU', pre_eval=True, save_best='auto')
deepspeed = True
deepspeed_config = 'zero_configs/adam_zero1_bf16.json'
pretrained = './pretrained/intern_vit_6b_224px.pth'
custom_hooks = [dict(type='ToBFloat16Hook', priority=49)]
work_dir = './work_dirs/segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16'
gpu_ids = range(0, 8)
auto_resume = False

2023-11-09 22:35:13,653 - mmseg - INFO - Set random seed to 15419458, deterministic: False
2023-11-09 22:36:35,693 - mmseg - INFO - <All keys matched successfully>
2023-11-09 22:37:00,605 - mmseg - INFO - initialize FCNHead with init_cfg {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
Name of parameter - Initialization information

backbone.pos_embed - torch.Size([1, 1297, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.cls_token - torch.Size([1, 1, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.patch_embed.proj.weight - torch.Size([3200, 3, 14, 14]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.patch_embed.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.0.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.1.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.2.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.3.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.4.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.5.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.6.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.7.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.8.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.9.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.10.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.11.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.12.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.13.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.14.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.15.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.16.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.17.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.18.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.19.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.20.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.21.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.22.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.23.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.24.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.25.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.26.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.27.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.28.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.29.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.30.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.31.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.32.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.33.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.34.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.35.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.36.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.37.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.38.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.39.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.40.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.41.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.42.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.43.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.44.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.45.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.46.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.norm1.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.attn.qkv.weight - torch.Size([9600, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.attn.proj.weight - torch.Size([3200, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.attn.proj.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.attn.q_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.attn.k_norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.ls1.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.norm2.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.mlp.fc1.weight - torch.Size([12800, 3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.mlp.fc1.bias - torch.Size([12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.mlp.fc2.weight - torch.Size([3200, 12800]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.mlp.fc2.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

backbone.blocks.47.ls2.gamma - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

decode_head.conv_seg.weight - torch.Size([150, 3200, 1, 1]): 
NormalInit: mean=0, std=0.01, bias=0 

decode_head.conv_seg.bias - torch.Size([150]): 
NormalInit: mean=0, std=0.01, bias=0 

decode_head.norm.weight - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  

decode_head.norm.bias - torch.Size([3200]): 
The value is the same before and after calling `init_weights` of EncoderDecoder  
2023-11-09 22:37:00,614 - mmseg - INFO - EncoderDecoder(
  (backbone): InternViT6B(
    (patch_embed): PatchEmbed(
      (proj): Conv2d(3, 3200, kernel_size=(14, 14), stride=(14, 14))
      (norm): Identity()
    )
    (pos_drop): Identity()
    (blocks): ModuleList(
      (0): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): Identity()
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): Identity()
      )
      (1): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.009)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.009)
      )
      (2): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.017)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.017)
      )
      (3): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.026)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.026)
      )
      (4): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.034)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.034)
      )
      (5): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.043)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.043)
      )
      (6): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.051)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.051)
      )
      (7): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.060)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.060)
      )
      (8): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.068)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.068)
      )
      (9): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.077)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.077)
      )
      (10): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.085)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.085)
      )
      (11): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.094)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.094)
      )
      (12): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.102)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.102)
      )
      (13): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.111)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.111)
      )
      (14): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.119)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.119)
      )
      (15): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.128)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.128)
      )
      (16): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.136)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.136)
      )
      (17): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.145)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.145)
      )
      (18): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.153)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.153)
      )
      (19): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.162)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.162)
      )
      (20): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.170)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.170)
      )
      (21): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.179)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.179)
      )
      (22): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.187)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.187)
      )
      (23): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.196)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.196)
      )
      (24): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.204)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.204)
      )
      (25): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.213)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.213)
      )
      (26): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.221)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.221)
      )
      (27): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.230)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.230)
      )
      (28): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.238)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.238)
      )
      (29): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.247)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.247)
      )
      (30): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.255)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.255)
      )
      (31): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.264)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.264)
      )
      (32): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.272)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.272)
      )
      (33): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.281)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.281)
      )
      (34): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.289)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.289)
      )
      (35): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.298)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.298)
      )
      (36): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.306)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.306)
      )
      (37): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.315)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.315)
      )
      (38): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.323)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.323)
      )
      (39): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.332)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.332)
      )
      (40): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.340)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.340)
      )
      (41): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.349)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.349)
      )
      (42): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.357)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.357)
      )
      (43): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.366)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.366)
      )
      (44): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.374)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.374)
      )
      (45): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.383)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.383)
      )
      (46): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.391)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.391)
      )
      (47): Block(
        (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (attn): Attention(
          (qkv): Linear(in_features=3200, out_features=9600, bias=False)
          (attn_drop): Dropout(p=0.0, inplace=False)
          (proj): Linear(in_features=3200, out_features=3200, bias=True)
          (proj_drop): Dropout(p=0.0, inplace=False)
          (inner_attn): FlashAttention()
          (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
          (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        )
        (ls1): LayerScale()
        (drop_path1): DropPath(drop_prob=0.400)
        (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True)
        (mlp): Mlp(
          (fc1): Linear(in_features=3200, out_features=12800, bias=True)
          (act): GELU(approximate='none')
          (drop1): Dropout(p=0.0, inplace=False)
          (fc2): Linear(in_features=12800, out_features=3200, bias=True)
          (drop2): Dropout(p=0.0, inplace=False)
        )
        (ls2): LayerScale()
        (drop_path2): DropPath(drop_prob=0.400)
      )
    )
  )
  (decode_head): FCNHead(
    input_transform=None, ignore_index=255, align_corners=False
    (loss_decode): CrossEntropyLoss(avg_non_ignore=False)
    (conv_seg): Conv2d(3200, 150, kernel_size=(1, 1), stride=(1, 1))
    (convs): Identity()
    (norm): SyncBatchNorm(3200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
)
2023-11-09 22:37:01,153 - mmseg - INFO - Loaded 20210 images
2023-11-09 22:37:01,162 - mmseg - INFO - Randomly select 1263 images
2023-11-09 22:37:02,444 - mmseg - INFO - {'num_layers': 48, 'layer_decay_rate': 0.95}
2023-11-09 22:37:02,445 - mmseg - INFO - Build LayerDecayOptimizerConstructor 0.950000 - 50
2023-11-09 22:37:02,449 - mmseg - INFO - Param groups = {
  "layer_0_decay": {
    "param_names": [
      "backbone.pos_embed",
      "backbone.cls_token",
      "backbone.patch_embed.proj.weight"
    ],
    "lr_scale": 0.0809947108175928,
    "lr": 3.2397884327037123e-06,
    "weight_decay": 0.05
  },
  "layer_0_no_decay": {
    "param_names": [
      "backbone.patch_embed.proj.bias"
    ],
    "lr_scale": 0.0809947108175928,
    "lr": 3.2397884327037123e-06,
    "weight_decay": 0.0
  },
  "layer_1_no_decay": {
    "param_names": [
      "backbone.blocks.0.norm1.weight",
      "backbone.blocks.0.attn.proj.bias",
      "backbone.blocks.0.attn.q_norm.weight",
      "backbone.blocks.0.attn.k_norm.weight",
      "backbone.blocks.0.ls1.gamma",
      "backbone.blocks.0.norm2.weight",
      "backbone.blocks.0.mlp.fc1.bias",
      "backbone.blocks.0.mlp.fc2.bias",
      "backbone.blocks.0.ls2.gamma"
    ],
    "lr_scale": 0.0852575903343082,
    "lr": 3.4103036133723282e-06,
    "weight_decay": 0.0
  },
  "layer_1_decay": {
    "param_names": [
      "backbone.blocks.0.attn.qkv.weight",
      "backbone.blocks.0.attn.proj.weight",
      "backbone.blocks.0.mlp.fc1.weight",
      "backbone.blocks.0.mlp.fc2.weight"
    ],
    "lr_scale": 0.0852575903343082,
    "lr": 3.4103036133723282e-06,
    "weight_decay": 0.05
  },
  "layer_2_no_decay": {
    "param_names": [
      "backbone.blocks.1.norm1.weight",
      "backbone.blocks.1.attn.proj.bias",
      "backbone.blocks.1.attn.q_norm.weight",
      "backbone.blocks.1.attn.k_norm.weight",
      "backbone.blocks.1.ls1.gamma",
      "backbone.blocks.1.norm2.weight",
      "backbone.blocks.1.mlp.fc1.bias",
      "backbone.blocks.1.mlp.fc2.bias",
      "backbone.blocks.1.ls2.gamma"
    ],
    "lr_scale": 0.08974483193085075,
    "lr": 3.5897932772340305e-06,
    "weight_decay": 0.0
  },
  "layer_2_decay": {
    "param_names": [
      "backbone.blocks.1.attn.qkv.weight",
      "backbone.blocks.1.attn.proj.weight",
      "backbone.blocks.1.mlp.fc1.weight",
      "backbone.blocks.1.mlp.fc2.weight"
    ],
    "lr_scale": 0.08974483193085075,
    "lr": 3.5897932772340305e-06,
    "weight_decay": 0.05
  },
  "layer_3_no_decay": {
    "param_names": [
      "backbone.blocks.2.norm1.weight",
      "backbone.blocks.2.attn.proj.bias",
      "backbone.blocks.2.attn.q_norm.weight",
      "backbone.blocks.2.attn.k_norm.weight",
      "backbone.blocks.2.ls1.gamma",
      "backbone.blocks.2.norm2.weight",
      "backbone.blocks.2.mlp.fc1.bias",
      "backbone.blocks.2.mlp.fc2.bias",
      "backbone.blocks.2.ls2.gamma"
    ],
    "lr_scale": 0.09446824413773763,
    "lr": 3.7787297655095058e-06,
    "weight_decay": 0.0
  },
  "layer_3_decay": {
    "param_names": [
      "backbone.blocks.2.attn.qkv.weight",
      "backbone.blocks.2.attn.proj.weight",
      "backbone.blocks.2.mlp.fc1.weight",
      "backbone.blocks.2.mlp.fc2.weight"
    ],
    "lr_scale": 0.09446824413773763,
    "lr": 3.7787297655095058e-06,
    "weight_decay": 0.05
  },
  "layer_4_no_decay": {
    "param_names": [
      "backbone.blocks.3.norm1.weight",
      "backbone.blocks.3.attn.proj.bias",
      "backbone.blocks.3.attn.q_norm.weight",
      "backbone.blocks.3.attn.k_norm.weight",
      "backbone.blocks.3.ls1.gamma",
      "backbone.blocks.3.norm2.weight",
      "backbone.blocks.3.mlp.fc1.bias",
      "backbone.blocks.3.mlp.fc2.bias",
      "backbone.blocks.3.ls2.gamma"
    ],
    "lr_scale": 0.09944025698709225,
    "lr": 3.97761027948369e-06,
    "weight_decay": 0.0
  },
  "layer_4_decay": {
    "param_names": [
      "backbone.blocks.3.attn.qkv.weight",
      "backbone.blocks.3.attn.proj.weight",
      "backbone.blocks.3.mlp.fc1.weight",
      "backbone.blocks.3.mlp.fc2.weight"
    ],
    "lr_scale": 0.09944025698709225,
    "lr": 3.97761027948369e-06,
    "weight_decay": 0.05
  },
  "layer_5_no_decay": {
    "param_names": [
      "backbone.blocks.4.norm1.weight",
      "backbone.blocks.4.attn.proj.bias",
      "backbone.blocks.4.attn.q_norm.weight",
      "backbone.blocks.4.attn.k_norm.weight",
      "backbone.blocks.4.ls1.gamma",
      "backbone.blocks.4.norm2.weight",
      "backbone.blocks.4.mlp.fc1.bias",
      "backbone.blocks.4.mlp.fc2.bias",
      "backbone.blocks.4.ls2.gamma"
    ],
    "lr_scale": 0.10467395472325501,
    "lr": 4.186958188930201e-06,
    "weight_decay": 0.0
  },
  "layer_5_decay": {
    "param_names": [
      "backbone.blocks.4.attn.qkv.weight",
      "backbone.blocks.4.attn.proj.weight",
      "backbone.blocks.4.mlp.fc1.weight",
      "backbone.blocks.4.mlp.fc2.weight"
    ],
    "lr_scale": 0.10467395472325501,
    "lr": 4.186958188930201e-06,
    "weight_decay": 0.05
  },
  "layer_6_no_decay": {
    "param_names": [
      "backbone.blocks.5.norm1.weight",
      "backbone.blocks.5.attn.proj.bias",
      "backbone.blocks.5.attn.q_norm.weight",
      "backbone.blocks.5.attn.k_norm.weight",
      "backbone.blocks.5.ls1.gamma",
      "backbone.blocks.5.norm2.weight",
      "backbone.blocks.5.mlp.fc1.bias",
      "backbone.blocks.5.mlp.fc2.bias",
      "backbone.blocks.5.ls2.gamma"
    ],
    "lr_scale": 0.11018311023500528,
    "lr": 4.407324409400211e-06,
    "weight_decay": 0.0
  },
  "layer_6_decay": {
    "param_names": [
      "backbone.blocks.5.attn.qkv.weight",
      "backbone.blocks.5.attn.proj.weight",
      "backbone.blocks.5.mlp.fc1.weight",
      "backbone.blocks.5.mlp.fc2.weight"
    ],
    "lr_scale": 0.11018311023500528,
    "lr": 4.407324409400211e-06,
    "weight_decay": 0.05
  },
  "layer_7_no_decay": {
    "param_names": [
      "backbone.blocks.6.norm1.weight",
      "backbone.blocks.6.attn.proj.bias",
      "backbone.blocks.6.attn.q_norm.weight",
      "backbone.blocks.6.attn.k_norm.weight",
      "backbone.blocks.6.ls1.gamma",
      "backbone.blocks.6.norm2.weight",
      "backbone.blocks.6.mlp.fc1.bias",
      "backbone.blocks.6.mlp.fc2.bias",
      "backbone.blocks.6.ls2.gamma"
    ],
    "lr_scale": 0.11598222130000556,
    "lr": 4.639288852000222e-06,
    "weight_decay": 0.0
  },
  "layer_7_decay": {
    "param_names": [
      "backbone.blocks.6.attn.qkv.weight",
      "backbone.blocks.6.attn.proj.weight",
      "backbone.blocks.6.mlp.fc1.weight",
      "backbone.blocks.6.mlp.fc2.weight"
    ],
    "lr_scale": 0.11598222130000556,
    "lr": 4.639288852000222e-06,
    "weight_decay": 0.05
  },
  "layer_8_no_decay": {
    "param_names": [
      "backbone.blocks.7.norm1.weight",
      "backbone.blocks.7.attn.proj.bias",
      "backbone.blocks.7.attn.q_norm.weight",
      "backbone.blocks.7.attn.k_norm.weight",
      "backbone.blocks.7.ls1.gamma",
      "backbone.blocks.7.norm2.weight",
      "backbone.blocks.7.mlp.fc1.bias",
      "backbone.blocks.7.mlp.fc2.bias",
      "backbone.blocks.7.ls2.gamma"
    ],
    "lr_scale": 0.12208654873684796,
    "lr": 4.883461949473919e-06,
    "weight_decay": 0.0
  },
  "layer_8_decay": {
    "param_names": [
      "backbone.blocks.7.attn.qkv.weight",
      "backbone.blocks.7.attn.proj.weight",
      "backbone.blocks.7.mlp.fc1.weight",
      "backbone.blocks.7.mlp.fc2.weight"
    ],
    "lr_scale": 0.12208654873684796,
    "lr": 4.883461949473919e-06,
    "weight_decay": 0.05
  },
  "layer_9_no_decay": {
    "param_names": [
      "backbone.blocks.8.norm1.weight",
      "backbone.blocks.8.attn.proj.bias",
      "backbone.blocks.8.attn.q_norm.weight",
      "backbone.blocks.8.attn.k_norm.weight",
      "backbone.blocks.8.ls1.gamma",
      "backbone.blocks.8.norm2.weight",
      "backbone.blocks.8.mlp.fc1.bias",
      "backbone.blocks.8.mlp.fc2.bias",
      "backbone.blocks.8.ls2.gamma"
    ],
    "lr_scale": 0.12851215656510312,
    "lr": 5.140486262604126e-06,
    "weight_decay": 0.0
  },
  "layer_9_decay": {
    "param_names": [
      "backbone.blocks.8.attn.qkv.weight",
      "backbone.blocks.8.attn.proj.weight",
      "backbone.blocks.8.mlp.fc1.weight",
      "backbone.blocks.8.mlp.fc2.weight"
    ],
    "lr_scale": 0.12851215656510312,
    "lr": 5.140486262604126e-06,
    "weight_decay": 0.05
  },
  "layer_10_no_decay": {
    "param_names": [
      "backbone.blocks.9.norm1.weight",
      "backbone.blocks.9.attn.proj.bias",
      "backbone.blocks.9.attn.q_norm.weight",
      "backbone.blocks.9.attn.k_norm.weight",
      "backbone.blocks.9.ls1.gamma",
      "backbone.blocks.9.norm2.weight",
      "backbone.blocks.9.mlp.fc1.bias",
      "backbone.blocks.9.mlp.fc2.bias",
      "backbone.blocks.9.ls2.gamma"
    ],
    "lr_scale": 0.13527595427905592,
    "lr": 5.411038171162237e-06,
    "weight_decay": 0.0
  },
  "layer_10_decay": {
    "param_names": [
      "backbone.blocks.9.attn.qkv.weight",
      "backbone.blocks.9.attn.proj.weight",
      "backbone.blocks.9.mlp.fc1.weight",
      "backbone.blocks.9.mlp.fc2.weight"
    ],
    "lr_scale": 0.13527595427905592,
    "lr": 5.411038171162237e-06,
    "weight_decay": 0.05
  },
  "layer_11_no_decay": {
    "param_names": [
      "backbone.blocks.10.norm1.weight",
      "backbone.blocks.10.attn.proj.bias",
      "backbone.blocks.10.attn.q_norm.weight",
      "backbone.blocks.10.attn.k_norm.weight",
      "backbone.blocks.10.ls1.gamma",
      "backbone.blocks.10.norm2.weight",
      "backbone.blocks.10.mlp.fc1.bias",
      "backbone.blocks.10.mlp.fc2.bias",
      "backbone.blocks.10.ls2.gamma"
    ],
    "lr_scale": 0.14239574134637467,
    "lr": 5.695829653854987e-06,
    "weight_decay": 0.0
  },
  "layer_11_decay": {
    "param_names": [
      "backbone.blocks.10.attn.qkv.weight",
      "backbone.blocks.10.attn.proj.weight",
      "backbone.blocks.10.mlp.fc1.weight",
      "backbone.blocks.10.mlp.fc2.weight"
    ],
    "lr_scale": 0.14239574134637467,
    "lr": 5.695829653854987e-06,
    "weight_decay": 0.05
  },
  "layer_12_no_decay": {
    "param_names": [
      "backbone.blocks.11.norm1.weight",
      "backbone.blocks.11.attn.proj.bias",
      "backbone.blocks.11.attn.q_norm.weight",
      "backbone.blocks.11.attn.k_norm.weight",
      "backbone.blocks.11.ls1.gamma",
      "backbone.blocks.11.norm2.weight",
      "backbone.blocks.11.mlp.fc1.bias",
      "backbone.blocks.11.mlp.fc2.bias",
      "backbone.blocks.11.ls2.gamma"
    ],
    "lr_scale": 0.14989025404881545,
    "lr": 5.995610161952619e-06,
    "weight_decay": 0.0
  },
  "layer_12_decay": {
    "param_names": [
      "backbone.blocks.11.attn.qkv.weight",
      "backbone.blocks.11.attn.proj.weight",
      "backbone.blocks.11.mlp.fc1.weight",
      "backbone.blocks.11.mlp.fc2.weight"
    ],
    "lr_scale": 0.14989025404881545,
    "lr": 5.995610161952619e-06,
    "weight_decay": 0.05
  },
  "layer_13_no_decay": {
    "param_names": [
      "backbone.blocks.12.norm1.weight",
      "backbone.blocks.12.attn.proj.bias",
      "backbone.blocks.12.attn.q_norm.weight",
      "backbone.blocks.12.attn.k_norm.weight",
      "backbone.blocks.12.ls1.gamma",
      "backbone.blocks.12.norm2.weight",
      "backbone.blocks.12.mlp.fc1.bias",
      "backbone.blocks.12.mlp.fc2.bias",
      "backbone.blocks.12.ls2.gamma"
    ],
    "lr_scale": 0.1577792147882268,
    "lr": 6.311168591529072e-06,
    "weight_decay": 0.0
  },
  "layer_13_decay": {
    "param_names": [
      "backbone.blocks.12.attn.qkv.weight",
      "backbone.blocks.12.attn.proj.weight",
      "backbone.blocks.12.mlp.fc1.weight",
      "backbone.blocks.12.mlp.fc2.weight"
    ],
    "lr_scale": 0.1577792147882268,
    "lr": 6.311168591529072e-06,
    "weight_decay": 0.05
  },
  "layer_14_no_decay": {
    "param_names": [
      "backbone.blocks.13.norm1.weight",
      "backbone.blocks.13.attn.proj.bias",
      "backbone.blocks.13.attn.q_norm.weight",
      "backbone.blocks.13.attn.k_norm.weight",
      "backbone.blocks.13.ls1.gamma",
      "backbone.blocks.13.norm2.weight",
      "backbone.blocks.13.mlp.fc1.bias",
      "backbone.blocks.13.mlp.fc2.bias",
      "backbone.blocks.13.ls2.gamma"
    ],
    "lr_scale": 0.16608338398760716,
    "lr": 6.6433353595042875e-06,
    "weight_decay": 0.0
  },
  "layer_14_decay": {
    "param_names": [
      "backbone.blocks.13.attn.qkv.weight",
      "backbone.blocks.13.attn.proj.weight",
      "backbone.blocks.13.mlp.fc1.weight",
      "backbone.blocks.13.mlp.fc2.weight"
    ],
    "lr_scale": 0.16608338398760716,
    "lr": 6.6433353595042875e-06,
    "weight_decay": 0.05
  },
  "layer_15_no_decay": {
    "param_names": [
      "backbone.blocks.14.norm1.weight",
      "backbone.blocks.14.attn.proj.bias",
      "backbone.blocks.14.attn.q_norm.weight",
      "backbone.blocks.14.attn.k_norm.weight",
      "backbone.blocks.14.ls1.gamma",
      "backbone.blocks.14.norm2.weight",
      "backbone.blocks.14.mlp.fc1.bias",
      "backbone.blocks.14.mlp.fc2.bias",
      "backbone.blocks.14.ls2.gamma"
    ],
    "lr_scale": 0.174824614723797,
    "lr": 6.9929845889518814e-06,
    "weight_decay": 0.0
  },
  "layer_15_decay": {
    "param_names": [
      "backbone.blocks.14.attn.qkv.weight",
      "backbone.blocks.14.attn.proj.weight",
      "backbone.blocks.14.mlp.fc1.weight",
      "backbone.blocks.14.mlp.fc2.weight"
    ],
    "lr_scale": 0.174824614723797,
    "lr": 6.9929845889518814e-06,
    "weight_decay": 0.05
  },
  "layer_16_no_decay": {
    "param_names": [
      "backbone.blocks.15.norm1.weight",
      "backbone.blocks.15.attn.proj.bias",
      "backbone.blocks.15.attn.q_norm.weight",
      "backbone.blocks.15.attn.k_norm.weight",
      "backbone.blocks.15.ls1.gamma",
      "backbone.blocks.15.norm2.weight",
      "backbone.blocks.15.mlp.fc1.bias",
      "backbone.blocks.15.mlp.fc2.bias",
      "backbone.blocks.15.ls2.gamma"
    ],
    "lr_scale": 0.18402591023557582,
    "lr": 7.361036409423033e-06,
    "weight_decay": 0.0
  },
  "layer_16_decay": {
    "param_names": [
      "backbone.blocks.15.attn.qkv.weight",
      "backbone.blocks.15.attn.proj.weight",
      "backbone.blocks.15.mlp.fc1.weight",
      "backbone.blocks.15.mlp.fc2.weight"
    ],
    "lr_scale": 0.18402591023557582,
    "lr": 7.361036409423033e-06,
    "weight_decay": 0.05
  },
  "layer_17_no_decay": {
    "param_names": [
      "backbone.blocks.16.norm1.weight",
      "backbone.blocks.16.attn.proj.bias",
      "backbone.blocks.16.attn.q_norm.weight",
      "backbone.blocks.16.attn.k_norm.weight",
      "backbone.blocks.16.ls1.gamma",
      "backbone.blocks.16.norm2.weight",
      "backbone.blocks.16.mlp.fc1.bias",
      "backbone.blocks.16.mlp.fc2.bias",
      "backbone.blocks.16.ls2.gamma"
    ],
    "lr_scale": 0.19371148445850087,
    "lr": 7.748459378340036e-06,
    "weight_decay": 0.0
  },
  "layer_17_decay": {
    "param_names": [
      "backbone.blocks.16.attn.qkv.weight",
      "backbone.blocks.16.attn.proj.weight",
      "backbone.blocks.16.mlp.fc1.weight",
      "backbone.blocks.16.mlp.fc2.weight"
    ],
    "lr_scale": 0.19371148445850087,
    "lr": 7.748459378340036e-06,
    "weight_decay": 0.05
  },
  "layer_18_no_decay": {
    "param_names": [
      "backbone.blocks.17.norm1.weight",
      "backbone.blocks.17.attn.proj.bias",
      "backbone.blocks.17.attn.q_norm.weight",
      "backbone.blocks.17.attn.k_norm.weight",
      "backbone.blocks.17.ls1.gamma",
      "backbone.blocks.17.norm2.weight",
      "backbone.blocks.17.mlp.fc1.bias",
      "backbone.blocks.17.mlp.fc2.bias",
      "backbone.blocks.17.ls2.gamma"
    ],
    "lr_scale": 0.2039068257457904,
    "lr": 8.156273029831616e-06,
    "weight_decay": 0.0
  },
  "layer_18_decay": {
    "param_names": [
      "backbone.blocks.17.attn.qkv.weight",
      "backbone.blocks.17.attn.proj.weight",
      "backbone.blocks.17.mlp.fc1.weight",
      "backbone.blocks.17.mlp.fc2.weight"
    ],
    "lr_scale": 0.2039068257457904,
    "lr": 8.156273029831616e-06,
    "weight_decay": 0.05
  },
  "layer_19_no_decay": {
    "param_names": [
      "backbone.blocks.18.norm1.weight",
      "backbone.blocks.18.attn.proj.bias",
      "backbone.blocks.18.attn.q_norm.weight",
      "backbone.blocks.18.attn.k_norm.weight",
      "backbone.blocks.18.ls1.gamma",
      "backbone.blocks.18.norm2.weight",
      "backbone.blocks.18.mlp.fc1.bias",
      "backbone.blocks.18.mlp.fc2.bias",
      "backbone.blocks.18.ls2.gamma"
    ],
    "lr_scale": 0.21463876394293727,
    "lr": 8.585550557717492e-06,
    "weight_decay": 0.0
  },
  "layer_19_decay": {
    "param_names": [
      "backbone.blocks.18.attn.qkv.weight",
      "backbone.blocks.18.attn.proj.weight",
      "backbone.blocks.18.mlp.fc1.weight",
      "backbone.blocks.18.mlp.fc2.weight"
    ],
    "lr_scale": 0.21463876394293727,
    "lr": 8.585550557717492e-06,
    "weight_decay": 0.05
  },
  "layer_20_no_decay": {
    "param_names": [
      "backbone.blocks.19.norm1.weight",
      "backbone.blocks.19.attn.proj.bias",
      "backbone.blocks.19.attn.q_norm.weight",
      "backbone.blocks.19.attn.k_norm.weight",
      "backbone.blocks.19.ls1.gamma",
      "backbone.blocks.19.norm2.weight",
      "backbone.blocks.19.mlp.fc1.bias",
      "backbone.blocks.19.mlp.fc2.bias",
      "backbone.blocks.19.ls2.gamma"
    ],
    "lr_scale": 0.22593554099256555,
    "lr": 9.037421639702623e-06,
    "weight_decay": 0.0
  },
  "layer_20_decay": {
    "param_names": [
      "backbone.blocks.19.attn.qkv.weight",
      "backbone.blocks.19.attn.proj.weight",
      "backbone.blocks.19.mlp.fc1.weight",
      "backbone.blocks.19.mlp.fc2.weight"
    ],
    "lr_scale": 0.22593554099256555,
    "lr": 9.037421639702623e-06,
    "weight_decay": 0.05
  },
  "layer_21_no_decay": {
    "param_names": [
      "backbone.blocks.20.norm1.weight",
      "backbone.blocks.20.attn.proj.bias",
      "backbone.blocks.20.attn.q_norm.weight",
      "backbone.blocks.20.attn.k_norm.weight",
      "backbone.blocks.20.ls1.gamma",
      "backbone.blocks.20.norm2.weight",
      "backbone.blocks.20.mlp.fc1.bias",
      "backbone.blocks.20.mlp.fc2.bias",
      "backbone.blocks.20.ls2.gamma"
    ],
    "lr_scale": 0.23782688525533216,
    "lr": 9.513075410213288e-06,
    "weight_decay": 0.0
  },
  "layer_21_decay": {
    "param_names": [
      "backbone.blocks.20.attn.qkv.weight",
      "backbone.blocks.20.attn.proj.weight",
      "backbone.blocks.20.mlp.fc1.weight",
      "backbone.blocks.20.mlp.fc2.weight"
    ],
    "lr_scale": 0.23782688525533216,
    "lr": 9.513075410213288e-06,
    "weight_decay": 0.05
  },
  "layer_22_no_decay": {
    "param_names": [
      "backbone.blocks.21.norm1.weight",
      "backbone.blocks.21.attn.proj.bias",
      "backbone.blocks.21.attn.q_norm.weight",
      "backbone.blocks.21.attn.k_norm.weight",
      "backbone.blocks.21.ls1.gamma",
      "backbone.blocks.21.norm2.weight",
      "backbone.blocks.21.mlp.fc1.bias",
      "backbone.blocks.21.mlp.fc2.bias",
      "backbone.blocks.21.ls2.gamma"
    ],
    "lr_scale": 0.2503440897424549,
    "lr": 1.0013763589698198e-05,
    "weight_decay": 0.0
  },
  "layer_22_decay": {
    "param_names": [
      "backbone.blocks.21.attn.qkv.weight",
      "backbone.blocks.21.attn.proj.weight",
      "backbone.blocks.21.mlp.fc1.weight",
      "backbone.blocks.21.mlp.fc2.weight"
    ],
    "lr_scale": 0.2503440897424549,
    "lr": 1.0013763589698198e-05,
    "weight_decay": 0.05
  },
  "layer_23_no_decay": {
    "param_names": [
      "backbone.blocks.22.norm1.weight",
      "backbone.blocks.22.attn.proj.bias",
      "backbone.blocks.22.attn.q_norm.weight",
      "backbone.blocks.22.attn.k_norm.weight",
      "backbone.blocks.22.ls1.gamma",
      "backbone.blocks.22.norm2.weight",
      "backbone.blocks.22.mlp.fc1.bias",
      "backbone.blocks.22.mlp.fc2.bias",
      "backbone.blocks.22.ls2.gamma"
    ],
    "lr_scale": 0.26352009446574204,
    "lr": 1.0540803778629682e-05,
    "weight_decay": 0.0
  },
  "layer_23_decay": {
    "param_names": [
      "backbone.blocks.22.attn.qkv.weight",
      "backbone.blocks.22.attn.proj.weight",
      "backbone.blocks.22.mlp.fc1.weight",
      "backbone.blocks.22.mlp.fc2.weight"
    ],
    "lr_scale": 0.26352009446574204,
    "lr": 1.0540803778629682e-05,
    "weight_decay": 0.05
  },
  "layer_24_no_decay": {
    "param_names": [
      "backbone.blocks.23.norm1.weight",
      "backbone.blocks.23.attn.proj.bias",
      "backbone.blocks.23.attn.q_norm.weight",
      "backbone.blocks.23.attn.k_norm.weight",
      "backbone.blocks.23.ls1.gamma",
      "backbone.blocks.23.norm2.weight",
      "backbone.blocks.23.mlp.fc1.bias",
      "backbone.blocks.23.mlp.fc2.bias",
      "backbone.blocks.23.ls2.gamma"
    ],
    "lr_scale": 0.27738957312183377,
    "lr": 1.109558292487335e-05,
    "weight_decay": 0.0
  },
  "layer_24_decay": {
    "param_names": [
      "backbone.blocks.23.attn.qkv.weight",
      "backbone.blocks.23.attn.proj.weight",
      "backbone.blocks.23.mlp.fc1.weight",
      "backbone.blocks.23.mlp.fc2.weight"
    ],
    "lr_scale": 0.27738957312183377,
    "lr": 1.109558292487335e-05,
    "weight_decay": 0.05
  },
  "layer_25_no_decay": {
    "param_names": [
      "backbone.blocks.24.norm1.weight",
      "backbone.blocks.24.attn.proj.bias",
      "backbone.blocks.24.attn.q_norm.weight",
      "backbone.blocks.24.attn.k_norm.weight",
      "backbone.blocks.24.ls1.gamma",
      "backbone.blocks.24.norm2.weight",
      "backbone.blocks.24.mlp.fc1.bias",
      "backbone.blocks.24.mlp.fc2.bias",
      "backbone.blocks.24.ls2.gamma"
    ],
    "lr_scale": 0.2919890243387724,
    "lr": 1.1679560973550896e-05,
    "weight_decay": 0.0
  },
  "layer_25_decay": {
    "param_names": [
      "backbone.blocks.24.attn.qkv.weight",
      "backbone.blocks.24.attn.proj.weight",
      "backbone.blocks.24.mlp.fc1.weight",
      "backbone.blocks.24.mlp.fc2.weight"
    ],
    "lr_scale": 0.2919890243387724,
    "lr": 1.1679560973550896e-05,
    "weight_decay": 0.05
  },
  "layer_26_no_decay": {
    "param_names": [
      "backbone.blocks.25.norm1.weight",
      "backbone.blocks.25.attn.proj.bias",
      "backbone.blocks.25.attn.q_norm.weight",
      "backbone.blocks.25.attn.k_norm.weight",
      "backbone.blocks.25.ls1.gamma",
      "backbone.blocks.25.norm2.weight",
      "backbone.blocks.25.mlp.fc1.bias",
      "backbone.blocks.25.mlp.fc2.bias",
      "backbone.blocks.25.ls2.gamma"
    ],
    "lr_scale": 0.3073568677250236,
    "lr": 1.2294274709000943e-05,
    "weight_decay": 0.0
  },
  "layer_26_decay": {
    "param_names": [
      "backbone.blocks.25.attn.qkv.weight",
      "backbone.blocks.25.attn.proj.weight",
      "backbone.blocks.25.mlp.fc1.weight",
      "backbone.blocks.25.mlp.fc2.weight"
    ],
    "lr_scale": 0.3073568677250236,
    "lr": 1.2294274709000943e-05,
    "weight_decay": 0.05
  },
  "layer_27_no_decay": {
    "param_names": [
      "backbone.blocks.26.norm1.weight",
      "backbone.blocks.26.attn.proj.bias",
      "backbone.blocks.26.attn.q_norm.weight",
      "backbone.blocks.26.attn.k_norm.weight",
      "backbone.blocks.26.ls1.gamma",
      "backbone.blocks.26.norm2.weight",
      "backbone.blocks.26.mlp.fc1.bias",
      "backbone.blocks.26.mlp.fc2.bias",
      "backbone.blocks.26.ls2.gamma"
    ],
    "lr_scale": 0.323533544973709,
    "lr": 1.2941341798948362e-05,
    "weight_decay": 0.0
  },
  "layer_27_decay": {
    "param_names": [
      "backbone.blocks.26.attn.qkv.weight",
      "backbone.blocks.26.attn.proj.weight",
      "backbone.blocks.26.mlp.fc1.weight",
      "backbone.blocks.26.mlp.fc2.weight"
    ],
    "lr_scale": 0.323533544973709,
    "lr": 1.2941341798948362e-05,
    "weight_decay": 0.05
  },
  "layer_28_no_decay": {
    "param_names": [
      "backbone.blocks.27.norm1.weight",
      "backbone.blocks.27.attn.proj.bias",
      "backbone.blocks.27.attn.q_norm.weight",
      "backbone.blocks.27.attn.k_norm.weight",
      "backbone.blocks.27.ls1.gamma",
      "backbone.blocks.27.norm2.weight",
      "backbone.blocks.27.mlp.fc1.bias",
      "backbone.blocks.27.mlp.fc2.bias",
      "backbone.blocks.27.ls2.gamma"
    ],
    "lr_scale": 0.3405616262881148,
    "lr": 1.3622465051524594e-05,
    "weight_decay": 0.0
  },
  "layer_28_decay": {
    "param_names": [
      "backbone.blocks.27.attn.qkv.weight",
      "backbone.blocks.27.attn.proj.weight",
      "backbone.blocks.27.mlp.fc1.weight",
      "backbone.blocks.27.mlp.fc2.weight"
    ],
    "lr_scale": 0.3405616262881148,
    "lr": 1.3622465051524594e-05,
    "weight_decay": 0.05
  },
  "layer_29_no_decay": {
    "param_names": [
      "backbone.blocks.28.norm1.weight",
      "backbone.blocks.28.attn.proj.bias",
      "backbone.blocks.28.attn.q_norm.weight",
      "backbone.blocks.28.attn.k_norm.weight",
      "backbone.blocks.28.ls1.gamma",
      "backbone.blocks.28.norm2.weight",
      "backbone.blocks.28.mlp.fc1.bias",
      "backbone.blocks.28.mlp.fc2.bias",
      "backbone.blocks.28.ls2.gamma"
    ],
    "lr_scale": 0.3584859224085419,
    "lr": 1.4339436896341676e-05,
    "weight_decay": 0.0
  },
  "layer_29_decay": {
    "param_names": [
      "backbone.blocks.28.attn.qkv.weight",
      "backbone.blocks.28.attn.proj.weight",
      "backbone.blocks.28.mlp.fc1.weight",
      "backbone.blocks.28.mlp.fc2.weight"
    ],
    "lr_scale": 0.3584859224085419,
    "lr": 1.4339436896341676e-05,
    "weight_decay": 0.05
  },
  "layer_30_no_decay": {
    "param_names": [
      "backbone.blocks.29.norm1.weight",
      "backbone.blocks.29.attn.proj.bias",
      "backbone.blocks.29.attn.q_norm.weight",
      "backbone.blocks.29.attn.k_norm.weight",
      "backbone.blocks.29.ls1.gamma",
      "backbone.blocks.29.norm2.weight",
      "backbone.blocks.29.mlp.fc1.bias",
      "backbone.blocks.29.mlp.fc2.bias",
      "backbone.blocks.29.ls2.gamma"
    ],
    "lr_scale": 0.37735360253530725,
    "lr": 1.509414410141229e-05,
    "weight_decay": 0.0
  },
  "layer_30_decay": {
    "param_names": [
      "backbone.blocks.29.attn.qkv.weight",
      "backbone.blocks.29.attn.proj.weight",
      "backbone.blocks.29.mlp.fc1.weight",
      "backbone.blocks.29.mlp.fc2.weight"
    ],
    "lr_scale": 0.37735360253530725,
    "lr": 1.509414410141229e-05,
    "weight_decay": 0.05
  },
  "layer_31_no_decay": {
    "param_names": [
      "backbone.blocks.30.norm1.weight",
      "backbone.blocks.30.attn.proj.bias",
      "backbone.blocks.30.attn.q_norm.weight",
      "backbone.blocks.30.attn.k_norm.weight",
      "backbone.blocks.30.ls1.gamma",
      "backbone.blocks.30.norm2.weight",
      "backbone.blocks.30.mlp.fc1.bias",
      "backbone.blocks.30.mlp.fc2.bias",
      "backbone.blocks.30.ls2.gamma"
    ],
    "lr_scale": 0.3972143184582182,
    "lr": 1.588857273832873e-05,
    "weight_decay": 0.0
  },
  "layer_31_decay": {
    "param_names": [
      "backbone.blocks.30.attn.qkv.weight",
      "backbone.blocks.30.attn.proj.weight",
      "backbone.blocks.30.mlp.fc1.weight",
      "backbone.blocks.30.mlp.fc2.weight"
    ],
    "lr_scale": 0.3972143184582182,
    "lr": 1.588857273832873e-05,
    "weight_decay": 0.05
  },
  "layer_32_no_decay": {
    "param_names": [
      "backbone.blocks.31.norm1.weight",
      "backbone.blocks.31.attn.proj.bias",
      "backbone.blocks.31.attn.q_norm.weight",
      "backbone.blocks.31.attn.k_norm.weight",
      "backbone.blocks.31.ls1.gamma",
      "backbone.blocks.31.norm2.weight",
      "backbone.blocks.31.mlp.fc1.bias",
      "backbone.blocks.31.mlp.fc2.bias",
      "backbone.blocks.31.ls2.gamma"
    ],
    "lr_scale": 0.4181203352191771,
    "lr": 1.6724813408767084e-05,
    "weight_decay": 0.0
  },
  "layer_32_decay": {
    "param_names": [
      "backbone.blocks.31.attn.qkv.weight",
      "backbone.blocks.31.attn.proj.weight",
      "backbone.blocks.31.mlp.fc1.weight",
      "backbone.blocks.31.mlp.fc2.weight"
    ],
    "lr_scale": 0.4181203352191771,
    "lr": 1.6724813408767084e-05,
    "weight_decay": 0.05
  },
  "layer_33_no_decay": {
    "param_names": [
      "backbone.blocks.32.norm1.weight",
      "backbone.blocks.32.attn.proj.bias",
      "backbone.blocks.32.attn.q_norm.weight",
      "backbone.blocks.32.attn.k_norm.weight",
      "backbone.blocks.32.ls1.gamma",
      "backbone.blocks.32.norm2.weight",
      "backbone.blocks.32.mlp.fc1.bias",
      "backbone.blocks.32.mlp.fc2.bias",
      "backbone.blocks.32.ls2.gamma"
    ],
    "lr_scale": 0.44012666865176536,
    "lr": 1.7605066746070617e-05,
    "weight_decay": 0.0
  },
  "layer_33_decay": {
    "param_names": [
      "backbone.blocks.32.attn.qkv.weight",
      "backbone.blocks.32.attn.proj.weight",
      "backbone.blocks.32.mlp.fc1.weight",
      "backbone.blocks.32.mlp.fc2.weight"
    ],
    "lr_scale": 0.44012666865176536,
    "lr": 1.7605066746070617e-05,
    "weight_decay": 0.05
  },
  "layer_34_no_decay": {
    "param_names": [
      "backbone.blocks.33.norm1.weight",
      "backbone.blocks.33.attn.proj.bias",
      "backbone.blocks.33.attn.q_norm.weight",
      "backbone.blocks.33.attn.k_norm.weight",
      "backbone.blocks.33.ls1.gamma",
      "backbone.blocks.33.norm2.weight",
      "backbone.blocks.33.mlp.fc1.bias",
      "backbone.blocks.33.mlp.fc2.bias",
      "backbone.blocks.33.ls2.gamma"
    ],
    "lr_scale": 0.46329123015975304,
    "lr": 1.8531649206390123e-05,
    "weight_decay": 0.0
  },
  "layer_34_decay": {
    "param_names": [
      "backbone.blocks.33.attn.qkv.weight",
      "backbone.blocks.33.attn.proj.weight",
      "backbone.blocks.33.mlp.fc1.weight",
      "backbone.blocks.33.mlp.fc2.weight"
    ],
    "lr_scale": 0.46329123015975304,
    "lr": 1.8531649206390123e-05,
    "weight_decay": 0.05
  },
  "layer_35_no_decay": {
    "param_names": [
      "backbone.blocks.34.norm1.weight",
      "backbone.blocks.34.attn.proj.bias",
      "backbone.blocks.34.attn.q_norm.weight",
      "backbone.blocks.34.attn.k_norm.weight",
      "backbone.blocks.34.ls1.gamma",
      "backbone.blocks.34.norm2.weight",
      "backbone.blocks.34.mlp.fc1.bias",
      "backbone.blocks.34.mlp.fc2.bias",
      "backbone.blocks.34.ls2.gamma"
    ],
    "lr_scale": 0.48767497911552954,
    "lr": 1.9506999164621184e-05,
    "weight_decay": 0.0
  },
  "layer_35_decay": {
    "param_names": [
      "backbone.blocks.34.attn.qkv.weight",
      "backbone.blocks.34.attn.proj.weight",
      "backbone.blocks.34.mlp.fc1.weight",
      "backbone.blocks.34.mlp.fc2.weight"
    ],
    "lr_scale": 0.48767497911552954,
    "lr": 1.9506999164621184e-05,
    "weight_decay": 0.05
  },
  "layer_36_no_decay": {
    "param_names": [
      "backbone.blocks.35.norm1.weight",
      "backbone.blocks.35.attn.proj.bias",
      "backbone.blocks.35.attn.q_norm.weight",
      "backbone.blocks.35.attn.k_norm.weight",
      "backbone.blocks.35.ls1.gamma",
      "backbone.blocks.35.norm2.weight",
      "backbone.blocks.35.mlp.fc1.bias",
      "backbone.blocks.35.mlp.fc2.bias",
      "backbone.blocks.35.ls2.gamma"
    ],
    "lr_scale": 0.5133420832795048,
    "lr": 2.0533683331180195e-05,
    "weight_decay": 0.0
  },
  "layer_36_decay": {
    "param_names": [
      "backbone.blocks.35.attn.qkv.weight",
      "backbone.blocks.35.attn.proj.weight",
      "backbone.blocks.35.mlp.fc1.weight",
      "backbone.blocks.35.mlp.fc2.weight"
    ],
    "lr_scale": 0.5133420832795048,
    "lr": 2.0533683331180195e-05,
    "weight_decay": 0.05
  },
  "layer_37_no_decay": {
    "param_names": [
      "backbone.blocks.36.norm1.weight",
      "backbone.blocks.36.attn.proj.bias",
      "backbone.blocks.36.attn.q_norm.weight",
      "backbone.blocks.36.attn.k_norm.weight",
      "backbone.blocks.36.ls1.gamma",
      "backbone.blocks.36.norm2.weight",
      "backbone.blocks.36.mlp.fc1.bias",
      "backbone.blocks.36.mlp.fc2.bias",
      "backbone.blocks.36.ls2.gamma"
    ],
    "lr_scale": 0.5403600876626367,
    "lr": 2.1614403506505468e-05,
    "weight_decay": 0.0
  },
  "layer_37_decay": {
    "param_names": [
      "backbone.blocks.36.attn.qkv.weight",
      "backbone.blocks.36.attn.proj.weight",
      "backbone.blocks.36.mlp.fc1.weight",
      "backbone.blocks.36.mlp.fc2.weight"
    ],
    "lr_scale": 0.5403600876626367,
    "lr": 2.1614403506505468e-05,
    "weight_decay": 0.05
  },
  "layer_38_no_decay": {
    "param_names": [
      "backbone.blocks.37.norm1.weight",
      "backbone.blocks.37.attn.proj.bias",
      "backbone.blocks.37.attn.q_norm.weight",
      "backbone.blocks.37.attn.k_norm.weight",
      "backbone.blocks.37.ls1.gamma",
      "backbone.blocks.37.norm2.weight",
      "backbone.blocks.37.mlp.fc1.bias",
      "backbone.blocks.37.mlp.fc2.bias",
      "backbone.blocks.37.ls2.gamma"
    ],
    "lr_scale": 0.5688000922764597,
    "lr": 2.275200369105839e-05,
    "weight_decay": 0.0
  },
  "layer_38_decay": {
    "param_names": [
      "backbone.blocks.37.attn.qkv.weight",
      "backbone.blocks.37.attn.proj.weight",
      "backbone.blocks.37.mlp.fc1.weight",
      "backbone.blocks.37.mlp.fc2.weight"
    ],
    "lr_scale": 0.5688000922764597,
    "lr": 2.275200369105839e-05,
    "weight_decay": 0.05
  },
  "layer_39_no_decay": {
    "param_names": [
      "backbone.blocks.38.norm1.weight",
      "backbone.blocks.38.attn.proj.bias",
      "backbone.blocks.38.attn.q_norm.weight",
      "backbone.blocks.38.attn.k_norm.weight",
      "backbone.blocks.38.ls1.gamma",
      "backbone.blocks.38.norm2.weight",
      "backbone.blocks.38.mlp.fc1.bias",
      "backbone.blocks.38.mlp.fc2.bias",
      "backbone.blocks.38.ls2.gamma"
    ],
    "lr_scale": 0.5987369392383787,
    "lr": 2.394947756953515e-05,
    "weight_decay": 0.0
  },
  "layer_39_decay": {
    "param_names": [
      "backbone.blocks.38.attn.qkv.weight",
      "backbone.blocks.38.attn.proj.weight",
      "backbone.blocks.38.mlp.fc1.weight",
      "backbone.blocks.38.mlp.fc2.weight"
    ],
    "lr_scale": 0.5987369392383787,
    "lr": 2.394947756953515e-05,
    "weight_decay": 0.05
  },
  "layer_40_no_decay": {
    "param_names": [
      "backbone.blocks.39.norm1.weight",
      "backbone.blocks.39.attn.proj.bias",
      "backbone.blocks.39.attn.q_norm.weight",
      "backbone.blocks.39.attn.k_norm.weight",
      "backbone.blocks.39.ls1.gamma",
      "backbone.blocks.39.norm2.weight",
      "backbone.blocks.39.mlp.fc1.bias",
      "backbone.blocks.39.mlp.fc2.bias",
      "backbone.blocks.39.ls2.gamma"
    ],
    "lr_scale": 0.6302494097246091,
    "lr": 2.5209976388984365e-05,
    "weight_decay": 0.0
  },
  "layer_40_decay": {
    "param_names": [
      "backbone.blocks.39.attn.qkv.weight",
      "backbone.blocks.39.attn.proj.weight",
      "backbone.blocks.39.mlp.fc1.weight",
      "backbone.blocks.39.mlp.fc2.weight"
    ],
    "lr_scale": 0.6302494097246091,
    "lr": 2.5209976388984365e-05,
    "weight_decay": 0.05
  },
  "layer_41_no_decay": {
    "param_names": [
      "backbone.blocks.40.norm1.weight",
      "backbone.blocks.40.attn.proj.bias",
      "backbone.blocks.40.attn.q_norm.weight",
      "backbone.blocks.40.attn.k_norm.weight",
      "backbone.blocks.40.ls1.gamma",
      "backbone.blocks.40.norm2.weight",
      "backbone.blocks.40.mlp.fc1.bias",
      "backbone.blocks.40.mlp.fc2.bias",
      "backbone.blocks.40.ls2.gamma"
    ],
    "lr_scale": 0.6634204312890623,
    "lr": 2.6536817251562493e-05,
    "weight_decay": 0.0
  },
  "layer_41_decay": {
    "param_names": [
      "backbone.blocks.40.attn.qkv.weight",
      "backbone.blocks.40.attn.proj.weight",
      "backbone.blocks.40.mlp.fc1.weight",
      "backbone.blocks.40.mlp.fc2.weight"
    ],
    "lr_scale": 0.6634204312890623,
    "lr": 2.6536817251562493e-05,
    "weight_decay": 0.05
  },
  "layer_42_no_decay": {
    "param_names": [
      "backbone.blocks.41.norm1.weight",
      "backbone.blocks.41.attn.proj.bias",
      "backbone.blocks.41.attn.q_norm.weight",
      "backbone.blocks.41.attn.k_norm.weight",
      "backbone.blocks.41.ls1.gamma",
      "backbone.blocks.41.norm2.weight",
      "backbone.blocks.41.mlp.fc1.bias",
      "backbone.blocks.41.mlp.fc2.bias",
      "backbone.blocks.41.ls2.gamma"
    ],
    "lr_scale": 0.6983372960937497,
    "lr": 2.793349184374999e-05,
    "weight_decay": 0.0
  },
  "layer_42_decay": {
    "param_names": [
      "backbone.blocks.41.attn.qkv.weight",
      "backbone.blocks.41.attn.proj.weight",
      "backbone.blocks.41.mlp.fc1.weight",
      "backbone.blocks.41.mlp.fc2.weight"
    ],
    "lr_scale": 0.6983372960937497,
    "lr": 2.793349184374999e-05,
    "weight_decay": 0.05
  },
  "layer_43_no_decay": {
    "param_names": [
      "backbone.blocks.42.norm1.weight",
      "backbone.blocks.42.attn.proj.bias",
      "backbone.blocks.42.attn.q_norm.weight",
      "backbone.blocks.42.attn.k_norm.weight",
      "backbone.blocks.42.ls1.gamma",
      "backbone.blocks.42.norm2.weight",
      "backbone.blocks.42.mlp.fc1.bias",
      "backbone.blocks.42.mlp.fc2.bias",
      "backbone.blocks.42.ls2.gamma"
    ],
    "lr_scale": 0.7350918906249998,
    "lr": 2.9403675624999993e-05,
    "weight_decay": 0.0
  },
  "layer_43_decay": {
    "param_names": [
      "backbone.blocks.42.attn.qkv.weight",
      "backbone.blocks.42.attn.proj.weight",
      "backbone.blocks.42.mlp.fc1.weight",
      "backbone.blocks.42.mlp.fc2.weight"
    ],
    "lr_scale": 0.7350918906249998,
    "lr": 2.9403675624999993e-05,
    "weight_decay": 0.05
  },
  "layer_44_no_decay": {
    "param_names": [
      "backbone.blocks.43.norm1.weight",
      "backbone.blocks.43.attn.proj.bias",
      "backbone.blocks.43.attn.q_norm.weight",
      "backbone.blocks.43.attn.k_norm.weight",
      "backbone.blocks.43.ls1.gamma",
      "backbone.blocks.43.norm2.weight",
      "backbone.blocks.43.mlp.fc1.bias",
      "backbone.blocks.43.mlp.fc2.bias",
      "backbone.blocks.43.ls2.gamma"
    ],
    "lr_scale": 0.7737809374999998,
    "lr": 3.0951237499999995e-05,
    "weight_decay": 0.0
  },
  "layer_44_decay": {
    "param_names": [
      "backbone.blocks.43.attn.qkv.weight",
      "backbone.blocks.43.attn.proj.weight",
      "backbone.blocks.43.mlp.fc1.weight",
      "backbone.blocks.43.mlp.fc2.weight"
    ],
    "lr_scale": 0.7737809374999998,
    "lr": 3.0951237499999995e-05,
    "weight_decay": 0.05
  },
  "layer_45_no_decay": {
    "param_names": [
      "backbone.blocks.44.norm1.weight",
      "backbone.blocks.44.attn.proj.bias",
      "backbone.blocks.44.attn.q_norm.weight",
      "backbone.blocks.44.attn.k_norm.weight",
      "backbone.blocks.44.ls1.gamma",
      "backbone.blocks.44.norm2.weight",
      "backbone.blocks.44.mlp.fc1.bias",
      "backbone.blocks.44.mlp.fc2.bias",
      "backbone.blocks.44.ls2.gamma"
    ],
    "lr_scale": 0.8145062499999999,
    "lr": 3.258025e-05,
    "weight_decay": 0.0
  },
  "layer_45_decay": {
    "param_names": [
      "backbone.blocks.44.attn.qkv.weight",
      "backbone.blocks.44.attn.proj.weight",
      "backbone.blocks.44.mlp.fc1.weight",
      "backbone.blocks.44.mlp.fc2.weight"
    ],
    "lr_scale": 0.8145062499999999,
    "lr": 3.258025e-05,
    "weight_decay": 0.05
  },
  "layer_46_no_decay": {
    "param_names": [
      "backbone.blocks.45.norm1.weight",
      "backbone.blocks.45.attn.proj.bias",
      "backbone.blocks.45.attn.q_norm.weight",
      "backbone.blocks.45.attn.k_norm.weight",
      "backbone.blocks.45.ls1.gamma",
      "backbone.blocks.45.norm2.weight",
      "backbone.blocks.45.mlp.fc1.bias",
      "backbone.blocks.45.mlp.fc2.bias",
      "backbone.blocks.45.ls2.gamma"
    ],
    "lr_scale": 0.8573749999999999,
    "lr": 3.4294999999999996e-05,
    "weight_decay": 0.0
  },
  "layer_46_decay": {
    "param_names": [
      "backbone.blocks.45.attn.qkv.weight",
      "backbone.blocks.45.attn.proj.weight",
      "backbone.blocks.45.mlp.fc1.weight",
      "backbone.blocks.45.mlp.fc2.weight"
    ],
    "lr_scale": 0.8573749999999999,
    "lr": 3.4294999999999996e-05,
    "weight_decay": 0.05
  },
  "layer_47_no_decay": {
    "param_names": [
      "backbone.blocks.46.norm1.weight",
      "backbone.blocks.46.attn.proj.bias",
      "backbone.blocks.46.attn.q_norm.weight",
      "backbone.blocks.46.attn.k_norm.weight",
      "backbone.blocks.46.ls1.gamma",
      "backbone.blocks.46.norm2.weight",
      "backbone.blocks.46.mlp.fc1.bias",
      "backbone.blocks.46.mlp.fc2.bias",
      "backbone.blocks.46.ls2.gamma"
    ],
    "lr_scale": 0.9025,
    "lr": 3.61e-05,
    "weight_decay": 0.0
  },
  "layer_47_decay": {
    "param_names": [
      "backbone.blocks.46.attn.qkv.weight",
      "backbone.blocks.46.attn.proj.weight",
      "backbone.blocks.46.mlp.fc1.weight",
      "backbone.blocks.46.mlp.fc2.weight"
    ],
    "lr_scale": 0.9025,
    "lr": 3.61e-05,
    "weight_decay": 0.05
  },
  "layer_48_no_decay": {
    "param_names": [
      "backbone.blocks.47.norm1.weight",
      "backbone.blocks.47.attn.proj.bias",
      "backbone.blocks.47.attn.q_norm.weight",
      "backbone.blocks.47.attn.k_norm.weight",
      "backbone.blocks.47.ls1.gamma",
      "backbone.blocks.47.norm2.weight",
      "backbone.blocks.47.mlp.fc1.bias",
      "backbone.blocks.47.mlp.fc2.bias",
      "backbone.blocks.47.ls2.gamma"
    ],
    "lr_scale": 0.95,
    "lr": 3.8e-05,
    "weight_decay": 0.0
  },
  "layer_48_decay": {
    "param_names": [
      "backbone.blocks.47.attn.qkv.weight",
      "backbone.blocks.47.attn.proj.weight",
      "backbone.blocks.47.mlp.fc1.weight",
      "backbone.blocks.47.mlp.fc2.weight"
    ],
    "lr_scale": 0.95,
    "lr": 3.8e-05,
    "weight_decay": 0.05
  },
  "layer_49_decay": {
    "param_names": [
      "decode_head.conv_seg.weight"
    ],
    "lr_scale": 1.0,
    "lr": 4e-05,
    "weight_decay": 0.05
  },
  "layer_49_no_decay": {
    "param_names": [
      "decode_head.conv_seg.bias",
      "decode_head.norm.weight",
      "decode_head.norm.bias"
    ],
    "lr_scale": 1.0,
    "lr": 4e-05,
    "weight_decay": 0.0
  }
}
2023-11-09 22:37:25,407 - mmseg - INFO - trainable parameters: 5906608150
2023-11-09 22:37:25,409 - mmseg - INFO - total parameters: 5906608150
2023-11-09 22:37:25,453 - mmseg - INFO - Loaded 2000 images
2023-11-09 22:37:25,453 - mmseg - INFO - Start running, host: wangwenhai@SH-IDC1-10-140-37-94, work_dir: /mnt/petrelfs/wangwenhai/workspace/ViTDetection/mmsegmentation/work_dirs/segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16
2023-11-09 22:37:25,454 - mmseg - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(49          ) ToBFloat16Hook                     
(49          ) ToBFloat16Hook                     
(NORMAL      ) DeepspeedCheckpointHook            
(LOW         ) DeepspeedDistEvalHook              
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) DeepspeedDistEvalHook              
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_train_iter:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) DeepspeedDistEvalHook              
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) DeepspeedCheckpointHook            
(LOW         ) IterTimerHook                      
(LOW         ) DeepspeedDistEvalHook              
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
after_train_epoch:
(NORMAL      ) DeepspeedCheckpointHook            
(LOW         ) DeepspeedDistEvalHook              
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
after_run:
(VERY_LOW    ) TextLoggerHook                     
(VERY_LOW    ) TensorboardLoggerHook              
 -------------------- 
2023-11-09 22:37:25,454 - mmseg - INFO - workflow: [('train', 1)], max: 5000 iters
2023-11-09 22:37:25,461 - mmseg - INFO - Checkpoints will be saved to /mnt/petrelfs/wangwenhai/workspace/ViTDetection/mmsegmentation/work_dirs/segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16 by HardDiskBackend.
2023-11-09 22:39:18,864 - mmseg - INFO - Iter [50/5000]	lr: 1.572e-06, eta: 1:46:33, time: 1.292, data_time: 0.009, memory: 38534, decode.loss_ce: 4.0440, decode.acc_seg: 4.9999, loss: 4.0440
2023-11-09 22:40:22,254 - mmseg - INFO - Iter [100/5000]	lr: 3.144e-06, eta: 1:44:30, time: 1.268, data_time: 0.051, memory: 38534, decode.loss_ce: 2.4000, decode.acc_seg: 47.4330, loss: 2.4000
2023-11-09 22:41:23,381 - mmseg - INFO - Iter [150/5000]	lr: 3.143e-06, eta: 1:41:54, time: 1.223, data_time: 0.007, memory: 38534, decode.loss_ce: 1.2759, decode.acc_seg: 66.3580, loss: 1.2759
2023-11-09 22:42:26,901 - mmseg - INFO - Iter [200/5000]	lr: 3.111e-06, eta: 1:41:02, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.9487, decode.acc_seg: 72.2640, loss: 0.9487
2023-11-09 22:43:30,449 - mmseg - INFO - Iter [250/5000]	lr: 3.078e-06, eta: 1:40:07, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.8691, decode.acc_seg: 74.2259, loss: 0.8691
2023-11-09 22:44:31,617 - mmseg - INFO - Iter [300/5000]	lr: 3.046e-06, eta: 1:38:31, time: 1.223, data_time: 0.007, memory: 38534, decode.loss_ce: 0.7616, decode.acc_seg: 75.9451, loss: 0.7616
2023-11-09 22:45:35,133 - mmseg - INFO - Iter [350/5000]	lr: 3.014e-06, eta: 1:37:37, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.6935, decode.acc_seg: 78.1956, loss: 0.6935
2023-11-09 22:46:38,661 - mmseg - INFO - Iter [400/5000]	lr: 2.981e-06, eta: 1:36:40, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.6339, decode.acc_seg: 79.1392, loss: 0.6339
2023-11-09 22:47:39,894 - mmseg - INFO - Iter [450/5000]	lr: 2.949e-06, eta: 1:35:18, time: 1.225, data_time: 0.007, memory: 38534, decode.loss_ce: 0.5782, decode.acc_seg: 80.9914, loss: 0.5782
2023-11-09 22:48:43,359 - mmseg - INFO - Iter [500/5000]	lr: 2.916e-06, eta: 1:34:21, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.5524, decode.acc_seg: 81.9658, loss: 0.5524
2023-11-09 22:49:44,561 - mmseg - INFO - Iter [550/5000]	lr: 2.884e-06, eta: 1:33:04, time: 1.224, data_time: 0.007, memory: 38534, decode.loss_ce: 0.5073, decode.acc_seg: 82.5137, loss: 0.5073
2023-11-09 22:50:48,079 - mmseg - INFO - Iter [600/5000]	lr: 2.852e-06, eta: 1:32:07, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.4482, decode.acc_seg: 84.7468, loss: 0.4482
2023-11-09 22:51:51,597 - mmseg - INFO - Iter [650/5000]	lr: 2.819e-06, eta: 1:31:09, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.4537, decode.acc_seg: 84.9285, loss: 0.4537
2023-11-09 22:52:52,802 - mmseg - INFO - Iter [700/5000]	lr: 2.787e-06, eta: 1:29:56, time: 1.224, data_time: 0.007, memory: 38534, decode.loss_ce: 0.4470, decode.acc_seg: 84.7759, loss: 0.4470
2023-11-09 22:53:56,275 - mmseg - INFO - Iter [750/5000]	lr: 2.754e-06, eta: 1:28:57, time: 1.269, data_time: 0.054, memory: 38534, decode.loss_ce: 0.4326, decode.acc_seg: 84.7979, loss: 0.4326
2023-11-09 22:54:59,849 - mmseg - INFO - Iter [800/5000]	lr: 2.722e-06, eta: 1:27:59, time: 1.271, data_time: 0.054, memory: 38534, decode.loss_ce: 0.3933, decode.acc_seg: 86.0453, loss: 0.3933
2023-11-09 22:56:01,076 - mmseg - INFO - Iter [850/5000]	lr: 2.690e-06, eta: 1:26:48, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.4120, decode.acc_seg: 85.9512, loss: 0.4120
2023-11-09 22:57:04,603 - mmseg - INFO - Iter [900/5000]	lr: 2.657e-06, eta: 1:25:49, time: 1.271, data_time: 0.050, memory: 38534, decode.loss_ce: 0.3696, decode.acc_seg: 87.1740, loss: 0.3696
2023-11-09 22:58:08,268 - mmseg - INFO - Iter [950/5000]	lr: 2.625e-06, eta: 1:24:50, time: 1.273, data_time: 0.053, memory: 38534, decode.loss_ce: 0.3792, decode.acc_seg: 86.0389, loss: 0.3792
2023-11-09 22:59:09,500 - mmseg - INFO - Saving checkpoint at 1000 iterations
2023-11-09 23:00:00,719 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-09 23:00:00,719 - mmseg - INFO - Iter [1000/5000]	lr: 2.592e-06, eta: 1:27:05, time: 2.249, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3743, decode.acc_seg: 86.5380, loss: 0.3743
2023-11-09 23:02:32,869 - mmseg - INFO - per class results:
2023-11-09 23:02:32,874 - mmseg - INFO - 
+---------------------+-------+-------+
|        Class        |  IoU  |  Acc  |
+---------------------+-------+-------+
|         wall        | 76.01 | 86.62 |
|       building      | 81.05 | 92.62 |
|         sky         | 92.76 | 95.94 |
|        floor        | 80.33 | 91.17 |
|         tree        | 73.11 | 88.05 |
|       ceiling       | 82.57 | 91.19 |
|         road        | 81.36 | 85.79 |
|         bed         |  88.3 | 96.37 |
|      windowpane     | 59.16 | 76.04 |
|        grass        | 62.06 | 84.58 |
|       cabinet       | 60.84 | 73.57 |
|       sidewalk      | 62.36 | 83.25 |
|        person       | 78.88 | 90.77 |
|        earth        | 34.18 | 48.85 |
|         door        | 51.87 | 70.98 |
|        table        | 60.02 | 78.56 |
|       mountain      | 49.71 | 59.94 |
|        plant        | 49.72 | 61.42 |
|       curtain       | 71.48 |  83.3 |
|        chair        | 50.73 | 62.97 |
|         car         | 78.18 | 92.87 |
|        water        | 54.83 | 75.58 |
|       painting      |  71.4 | 84.72 |
|         sofa        | 62.98 | 71.65 |
|        shelf        | 30.38 | 50.55 |
|        house        | 15.49 | 17.67 |
|         sea         | 50.96 | 59.34 |
|        mirror       | 66.35 | 83.14 |
|         rug         | 62.31 | 67.39 |
|        field        | 25.91 | 42.21 |
|       armchair      | 39.42 | 77.02 |
|         seat        | 52.62 | 76.12 |
|        fence        | 30.48 | 44.06 |
|         desk        | 39.69 | 63.05 |
|         rock        | 45.29 | 69.01 |
|       wardrobe      | 35.09 | 44.37 |
|         lamp        |  59.3 | 71.33 |
|       bathtub       | 78.04 | 84.47 |
|       railing       | 37.86 | 53.32 |
|       cushion       | 58.94 | 68.96 |
|         base        | 21.16 | 40.29 |
|         box         | 23.34 | 27.04 |
|        column       | 43.94 | 60.98 |
|      signboard      | 33.16 | 52.98 |
|   chest of drawers  | 33.85 | 67.77 |
|       counter       | 30.07 | 41.85 |
|         sand        | 57.18 | 83.71 |
|         sink        | 70.04 | 77.99 |
|      skyscraper     | 41.72 | 50.03 |
|      fireplace      | 72.16 | 85.32 |
|     refrigerator    | 70.87 | 79.04 |
|      grandstand     |  7.04 |  8.16 |
|         path        | 14.55 | 19.14 |
|        stairs       | 39.77 | 51.14 |
|        runway       | 72.61 |  90.3 |
|         case        | 14.74 | 19.45 |
|      pool table     | 91.19 | 96.37 |
|        pillow       | 51.65 | 57.72 |
|     screen door     |  72.1 | 78.86 |
|       stairway      | 47.34 | 68.79 |
|        river        | 14.79 | 33.04 |
|        bridge       | 67.81 | 81.83 |
|       bookcase      | 27.76 | 40.46 |
|        blind        |  5.59 |  6.33 |
|     coffee table    | 60.63 | 82.61 |
|        toilet       | 82.16 | 92.56 |
|        flower       | 33.99 | 54.98 |
|         book        | 44.18 | 67.92 |
|         hill        |  6.24 |  7.93 |
|        bench        | 49.04 | 54.55 |
|      countertop     | 58.34 | 70.65 |
|        stove        | 71.67 | 84.96 |
|         palm        | 46.42 | 75.07 |
|    kitchen island   | 43.63 | 77.05 |
|       computer      | 65.16 | 75.52 |
|     swivel chair    | 38.01 | 70.25 |
|         boat        | 45.56 | 75.22 |
|         bar         | 39.05 | 56.72 |
|    arcade machine   | 79.41 | 83.58 |
|        hovel        | 14.51 | 20.55 |
|         bus         | 87.88 | 92.86 |
|        towel        | 70.51 | 83.39 |
|        light        | 38.42 | 48.94 |
|        truck        | 31.81 | 36.83 |
|        tower        | 15.58 |  28.1 |
|      chandelier     | 60.37 | 76.97 |
|        awning       | 25.74 | 37.09 |
|     streetlight     | 20.62 | 31.28 |
|        booth        | 26.86 | 27.39 |
| television receiver | 70.94 | 81.09 |
|       airplane      | 57.17 | 64.48 |
|      dirt track     | 15.36 | 25.38 |
|       apparel       | 38.78 | 85.79 |
|         pole        | 14.97 | 18.58 |
|         land        |  0.0  |  0.0  |
|      bannister      |  7.45 | 10.86 |
|      escalator      |  51.2 | 65.68 |
|       ottoman       |  48.0 | 70.92 |
|        bottle       | 14.53 | 16.76 |
|        buffet       | 41.15 | 64.94 |
|        poster       | 25.23 | 35.18 |
|        stage        |  8.65 | 15.65 |
|         van         |  0.0  |  0.0  |
|         ship        |  0.0  |  0.0  |
|       fountain      | 32.24 | 33.52 |
|    conveyer belt    | 85.01 | 91.41 |
|        canopy       | 53.95 | 66.42 |
|        washer       | 75.59 | 78.07 |
|      plaything      | 33.14 | 48.63 |
|    swimming pool    | 39.59 | 39.59 |
|        stool        |  31.1 | 38.33 |
|        barrel       | 20.46 | 20.62 |
|        basket       | 32.62 | 47.58 |
|      waterfall      | 49.59 | 76.54 |
|         tent        |  0.0  |  0.0  |
|         bag         | 11.58 | 13.38 |
|       minibike      | 66.75 | 75.33 |
|        cradle       | 74.83 | 97.14 |
|         oven        | 10.85 | 11.85 |
|         ball        | 36.31 | 68.06 |
|         food        |  8.34 |  8.4  |
|         step        | 10.16 | 11.31 |
|         tank        | 32.65 | 33.68 |
|      trade name     | 23.73 | 28.04 |
|      microwave      | 74.19 |  92.3 |
|         pot         | 49.22 | 57.31 |
|        animal       | 58.91 |  61.5 |
|       bicycle       | 57.48 | 77.05 |
|         lake        |  0.0  |  0.0  |
|      dishwasher     | 56.48 | 78.87 |
|        screen       | 65.05 | 84.63 |
|       blanket       |  9.18 |  9.85 |
|      sculpture      | 30.55 | 31.06 |
|         hood        | 59.74 | 65.19 |
|        sconce       | 30.91 | 41.56 |
|         vase        | 31.37 | 51.62 |
|    traffic light    | 28.59 | 36.51 |
|         tray        |  9.86 | 21.99 |
|        ashcan       | 43.18 | 55.68 |
|         fan         | 50.16 | 59.79 |
|         pier        | 28.65 | 29.15 |
|      crt screen     |  5.8  |  6.94 |
|        plate        | 50.77 | 73.55 |
|       monitor       |  3.37 |  3.55 |
|    bulletin board   | 31.81 | 38.95 |
|        shower       |  0.0  |  0.0  |
|       radiator      | 63.34 | 72.21 |
|        glass        | 15.23 | 16.27 |
|        clock        | 30.35 |  31.7 |
|         flag        | 66.79 | 70.58 |
+---------------------+-------+-------+
2023-11-09 23:02:32,874 - mmseg - INFO - Summary:
2023-11-09 23:02:32,875 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 81.22 | 43.97 | 55.61 |
+-------+-------+-------+
2023-11-09 23:02:32,875 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-09 23:02:32,876 - mmseg - INFO - Iter(val) [250]	aAcc: 0.8122, mIoU: 0.4397, mAcc: 0.5561, IoU.wall: 0.7601, IoU.building: 0.8105, IoU.sky: 0.9276, IoU.floor: 0.8033, IoU.tree: 0.7311, IoU.ceiling: 0.8257, IoU.road: 0.8136, IoU.bed : 0.8830, IoU.windowpane: 0.5916, IoU.grass: 0.6206, IoU.cabinet: 0.6084, IoU.sidewalk: 0.6236, IoU.person: 0.7888, IoU.earth: 0.3418, IoU.door: 0.5187, IoU.table: 0.6002, IoU.mountain: 0.4971, IoU.plant: 0.4972, IoU.curtain: 0.7148, IoU.chair: 0.5073, IoU.car: 0.7818, IoU.water: 0.5483, IoU.painting: 0.7140, IoU.sofa: 0.6298, IoU.shelf: 0.3038, IoU.house: 0.1549, IoU.sea: 0.5096, IoU.mirror: 0.6635, IoU.rug: 0.6231, IoU.field: 0.2591, IoU.armchair: 0.3942, IoU.seat: 0.5262, IoU.fence: 0.3048, IoU.desk: 0.3969, IoU.rock: 0.4529, IoU.wardrobe: 0.3509, IoU.lamp: 0.5930, IoU.bathtub: 0.7804, IoU.railing: 0.3786, IoU.cushion: 0.5894, IoU.base: 0.2116, IoU.box: 0.2334, IoU.column: 0.4394, IoU.signboard: 0.3316, IoU.chest of drawers: 0.3385, IoU.counter: 0.3007, IoU.sand: 0.5718, IoU.sink: 0.7004, IoU.skyscraper: 0.4172, IoU.fireplace: 0.7216, IoU.refrigerator: 0.7087, IoU.grandstand: 0.0704, IoU.path: 0.1455, IoU.stairs: 0.3977, IoU.runway: 0.7261, IoU.case: 0.1474, IoU.pool table: 0.9119, IoU.pillow: 0.5165, IoU.screen door: 0.7210, IoU.stairway: 0.4734, IoU.river: 0.1479, IoU.bridge: 0.6781, IoU.bookcase: 0.2776, IoU.blind: 0.0559, IoU.coffee table: 0.6063, IoU.toilet: 0.8216, IoU.flower: 0.3399, IoU.book: 0.4418, IoU.hill: 0.0624, IoU.bench: 0.4904, IoU.countertop: 0.5834, IoU.stove: 0.7167, IoU.palm: 0.4642, IoU.kitchen island: 0.4363, IoU.computer: 0.6516, IoU.swivel chair: 0.3801, IoU.boat: 0.4556, IoU.bar: 0.3905, IoU.arcade machine: 0.7941, IoU.hovel: 0.1451, IoU.bus: 0.8788, IoU.towel: 0.7051, IoU.light: 0.3842, IoU.truck: 0.3181, IoU.tower: 0.1558, IoU.chandelier: 0.6037, IoU.awning: 0.2574, IoU.streetlight: 0.2062, IoU.booth: 0.2686, IoU.television receiver: 0.7094, IoU.airplane: 0.5717, IoU.dirt track: 0.1536, IoU.apparel: 0.3878, IoU.pole: 0.1497, IoU.land: 0.0000, IoU.bannister: 0.0745, IoU.escalator: 0.5120, IoU.ottoman: 0.4800, IoU.bottle: 0.1453, IoU.buffet: 0.4115, IoU.poster: 0.2523, IoU.stage: 0.0865, IoU.van: 0.0000, IoU.ship: 0.0000, IoU.fountain: 0.3224, IoU.conveyer belt: 0.8501, IoU.canopy: 0.5395, IoU.washer: 0.7559, IoU.plaything: 0.3314, IoU.swimming pool: 0.3959, IoU.stool: 0.3110, IoU.barrel: 0.2046, IoU.basket: 0.3262, IoU.waterfall: 0.4959, IoU.tent: 0.0000, IoU.bag: 0.1158, IoU.minibike: 0.6675, IoU.cradle: 0.7483, IoU.oven: 0.1085, IoU.ball: 0.3631, IoU.food: 0.0834, IoU.step: 0.1016, IoU.tank: 0.3265, IoU.trade name: 0.2373, IoU.microwave: 0.7419, IoU.pot: 0.4922, IoU.animal: 0.5891, IoU.bicycle: 0.5748, IoU.lake: 0.0000, IoU.dishwasher: 0.5648, IoU.screen: 0.6505, IoU.blanket: 0.0918, IoU.sculpture: 0.3055, IoU.hood: 0.5974, IoU.sconce: 0.3091, IoU.vase: 0.3137, IoU.traffic light: 0.2859, IoU.tray: 0.0986, IoU.ashcan: 0.4318, IoU.fan: 0.5016, IoU.pier: 0.2865, IoU.crt screen: 0.0580, IoU.plate: 0.5077, IoU.monitor: 0.0337, IoU.bulletin board: 0.3181, IoU.shower: 0.0000, IoU.radiator: 0.6334, IoU.glass: 0.1523, IoU.clock: 0.3035, IoU.flag: 0.6679, Acc.wall: 0.8662, Acc.building: 0.9262, Acc.sky: 0.9594, Acc.floor: 0.9117, Acc.tree: 0.8805, Acc.ceiling: 0.9119, Acc.road: 0.8579, Acc.bed : 0.9637, Acc.windowpane: 0.7604, Acc.grass: 0.8458, Acc.cabinet: 0.7357, Acc.sidewalk: 0.8325, Acc.person: 0.9077, Acc.earth: 0.4885, Acc.door: 0.7098, Acc.table: 0.7856, Acc.mountain: 0.5994, Acc.plant: 0.6142, Acc.curtain: 0.8330, Acc.chair: 0.6297, Acc.car: 0.9287, Acc.water: 0.7558, Acc.painting: 0.8472, Acc.sofa: 0.7165, Acc.shelf: 0.5055, Acc.house: 0.1767, Acc.sea: 0.5934, Acc.mirror: 0.8314, Acc.rug: 0.6739, Acc.field: 0.4221, Acc.armchair: 0.7702, Acc.seat: 0.7612, Acc.fence: 0.4406, Acc.desk: 0.6305, Acc.rock: 0.6901, Acc.wardrobe: 0.4437, Acc.lamp: 0.7133, Acc.bathtub: 0.8447, Acc.railing: 0.5332, Acc.cushion: 0.6896, Acc.base: 0.4029, Acc.box: 0.2704, Acc.column: 0.6098, Acc.signboard: 0.5298, Acc.chest of drawers: 0.6777, Acc.counter: 0.4185, Acc.sand: 0.8371, Acc.sink: 0.7799, Acc.skyscraper: 0.5003, Acc.fireplace: 0.8532, Acc.refrigerator: 0.7904, Acc.grandstand: 0.0816, Acc.path: 0.1914, Acc.stairs: 0.5114, Acc.runway: 0.9030, Acc.case: 0.1945, Acc.pool table: 0.9637, Acc.pillow: 0.5772, Acc.screen door: 0.7886, Acc.stairway: 0.6879, Acc.river: 0.3304, Acc.bridge: 0.8183, Acc.bookcase: 0.4046, Acc.blind: 0.0633, Acc.coffee table: 0.8261, Acc.toilet: 0.9256, Acc.flower: 0.5498, Acc.book: 0.6792, Acc.hill: 0.0793, Acc.bench: 0.5455, Acc.countertop: 0.7065, Acc.stove: 0.8496, Acc.palm: 0.7507, Acc.kitchen island: 0.7705, Acc.computer: 0.7552, Acc.swivel chair: 0.7025, Acc.boat: 0.7522, Acc.bar: 0.5672, Acc.arcade machine: 0.8358, Acc.hovel: 0.2055, Acc.bus: 0.9286, Acc.towel: 0.8339, Acc.light: 0.4894, Acc.truck: 0.3683, Acc.tower: 0.2810, Acc.chandelier: 0.7697, Acc.awning: 0.3709, Acc.streetlight: 0.3128, Acc.booth: 0.2739, Acc.television receiver: 0.8109, Acc.airplane: 0.6448, Acc.dirt track: 0.2538, Acc.apparel: 0.8579, Acc.pole: 0.1858, Acc.land: 0.0000, Acc.bannister: 0.1086, Acc.escalator: 0.6568, Acc.ottoman: 0.7092, Acc.bottle: 0.1676, Acc.buffet: 0.6494, Acc.poster: 0.3518, Acc.stage: 0.1565, Acc.van: 0.0000, Acc.ship: 0.0000, Acc.fountain: 0.3352, Acc.conveyer belt: 0.9141, Acc.canopy: 0.6642, Acc.washer: 0.7807, Acc.plaything: 0.4863, Acc.swimming pool: 0.3959, Acc.stool: 0.3833, Acc.barrel: 0.2062, Acc.basket: 0.4758, Acc.waterfall: 0.7654, Acc.tent: 0.0000, Acc.bag: 0.1338, Acc.minibike: 0.7533, Acc.cradle: 0.9714, Acc.oven: 0.1185, Acc.ball: 0.6806, Acc.food: 0.0840, Acc.step: 0.1131, Acc.tank: 0.3368, Acc.trade name: 0.2804, Acc.microwave: 0.9230, Acc.pot: 0.5731, Acc.animal: 0.6150, Acc.bicycle: 0.7705, Acc.lake: 0.0000, Acc.dishwasher: 0.7887, Acc.screen: 0.8463, Acc.blanket: 0.0985, Acc.sculpture: 0.3106, Acc.hood: 0.6519, Acc.sconce: 0.4156, Acc.vase: 0.5162, Acc.traffic light: 0.3651, Acc.tray: 0.2199, Acc.ashcan: 0.5568, Acc.fan: 0.5979, Acc.pier: 0.2915, Acc.crt screen: 0.0694, Acc.plate: 0.7355, Acc.monitor: 0.0355, Acc.bulletin board: 0.3895, Acc.shower: 0.0000, Acc.radiator: 0.7221, Acc.glass: 0.1627, Acc.clock: 0.3170, Acc.flag: 0.7058
2023-11-09 23:03:36,506 - mmseg - INFO - Iter [1050/5000]	lr: 2.560e-06, eta: 1:35:26, time: 4.316, data_time: 3.097, memory: 38534, decode.loss_ce: 0.3789, decode.acc_seg: 87.0747, loss: 0.3789
2023-11-09 23:04:37,693 - mmseg - INFO - Iter [1100/5000]	lr: 2.528e-06, eta: 1:33:33, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3808, decode.acc_seg: 86.7820, loss: 0.3808
2023-11-09 23:05:41,158 - mmseg - INFO - Iter [1150/5000]	lr: 2.495e-06, eta: 1:31:53, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.3509, decode.acc_seg: 87.4624, loss: 0.3509
2023-11-09 23:06:44,731 - mmseg - INFO - Iter [1200/5000]	lr: 2.463e-06, eta: 1:30:16, time: 1.271, data_time: 0.050, memory: 38534, decode.loss_ce: 0.3365, decode.acc_seg: 88.0465, loss: 0.3365
2023-11-09 23:07:45,952 - mmseg - INFO - Iter [1250/5000]	lr: 2.430e-06, eta: 1:28:34, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3352, decode.acc_seg: 88.0570, loss: 0.3352
2023-11-09 23:08:49,561 - mmseg - INFO - Iter [1300/5000]	lr: 2.398e-06, eta: 1:27:03, time: 1.272, data_time: 0.054, memory: 38534, decode.loss_ce: 0.3128, decode.acc_seg: 88.7493, loss: 0.3128
2023-11-09 23:09:53,145 - mmseg - INFO - Iter [1350/5000]	lr: 2.366e-06, eta: 1:25:33, time: 1.272, data_time: 0.052, memory: 38534, decode.loss_ce: 0.3241, decode.acc_seg: 88.2304, loss: 0.3241
2023-11-09 23:10:54,427 - mmseg - INFO - Iter [1400/5000]	lr: 2.333e-06, eta: 1:24:00, time: 1.226, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3072, decode.acc_seg: 88.7156, loss: 0.3072
2023-11-09 23:11:57,947 - mmseg - INFO - Iter [1450/5000]	lr: 2.301e-06, eta: 1:22:34, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2948, decode.acc_seg: 89.1791, loss: 0.2948
2023-11-09 23:12:59,205 - mmseg - INFO - Iter [1500/5000]	lr: 2.268e-06, eta: 1:21:04, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2989, decode.acc_seg: 89.1917, loss: 0.2989
2023-11-09 23:14:02,895 - mmseg - INFO - Iter [1550/5000]	lr: 2.236e-06, eta: 1:19:42, time: 1.274, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2721, decode.acc_seg: 89.9743, loss: 0.2721
2023-11-09 23:15:06,503 - mmseg - INFO - Iter [1600/5000]	lr: 2.204e-06, eta: 1:18:20, time: 1.272, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2817, decode.acc_seg: 89.6098, loss: 0.2817
2023-11-09 23:16:07,820 - mmseg - INFO - Iter [1650/5000]	lr: 2.171e-06, eta: 1:16:55, time: 1.226, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2712, decode.acc_seg: 89.8701, loss: 0.2712
2023-11-09 23:17:11,349 - mmseg - INFO - Iter [1700/5000]	lr: 2.139e-06, eta: 1:15:36, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2808, decode.acc_seg: 89.5798, loss: 0.2808
2023-11-09 23:18:14,837 - mmseg - INFO - Iter [1750/5000]	lr: 2.107e-06, eta: 1:14:18, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2725, decode.acc_seg: 89.6531, loss: 0.2725
2023-11-09 23:19:16,050 - mmseg - INFO - Iter [1800/5000]	lr: 2.074e-06, eta: 1:12:56, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2814, decode.acc_seg: 89.8453, loss: 0.2814
2023-11-09 23:20:19,638 - mmseg - INFO - Iter [1850/5000]	lr: 2.042e-06, eta: 1:11:39, time: 1.272, data_time: 0.054, memory: 38534, decode.loss_ce: 0.2578, decode.acc_seg: 90.5057, loss: 0.2578
2023-11-09 23:21:23,335 - mmseg - INFO - Iter [1900/5000]	lr: 2.009e-06, eta: 1:10:24, time: 1.274, data_time: 0.054, memory: 38534, decode.loss_ce: 0.2555, decode.acc_seg: 90.4556, loss: 0.2555
2023-11-09 23:22:24,580 - mmseg - INFO - Iter [1950/5000]	lr: 1.977e-06, eta: 1:09:05, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2619, decode.acc_seg: 90.5555, loss: 0.2619
2023-11-09 23:23:28,063 - mmseg - INFO - Saving checkpoint at 2000 iterations
2023-11-09 23:24:18,842 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-09 23:24:18,843 - mmseg - INFO - Iter [2000/5000]	lr: 1.945e-06, eta: 1:09:06, time: 2.285, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2430, decode.acc_seg: 90.6814, loss: 0.2430
2023-11-09 23:25:12,740 - mmseg - INFO - per class results:
2023-11-09 23:25:12,745 - mmseg - INFO - 
+---------------------+-------+-------+
|        Class        |  IoU  |  Acc  |
+---------------------+-------+-------+
|         wall        | 76.91 | 87.22 |
|       building      | 81.73 | 90.47 |
|         sky         | 93.29 | 96.97 |
|        floor        | 80.89 | 87.87 |
|         tree        |  73.5 |  88.3 |
|       ceiling       | 83.22 | 93.68 |
|         road        | 82.15 | 88.46 |
|         bed         | 89.39 | 95.78 |
|      windowpane     |  61.7 | 78.79 |
|        grass        | 63.71 | 79.76 |
|       cabinet       | 60.66 | 69.41 |
|       sidewalk      | 63.93 | 81.23 |
|        person       | 78.75 | 93.59 |
|        earth        | 34.28 |  47.5 |
|         door        | 53.04 | 64.49 |
|        table        | 60.92 | 78.99 |
|       mountain      |  52.5 | 62.26 |
|        plant        | 50.55 | 61.72 |
|       curtain       | 72.88 | 87.23 |
|        chair        | 53.18 | 67.42 |
|         car         | 80.48 | 93.52 |
|        water        | 51.04 | 67.49 |
|       painting      | 72.61 |  88.3 |
|         sofa        | 69.55 | 87.04 |
|        shelf        | 34.57 | 56.73 |
|        house        | 34.85 | 49.25 |
|         sea         | 53.04 | 68.83 |
|        mirror       |  70.6 | 78.69 |
|         rug         | 66.14 | 81.18 |
|        field        | 33.09 | 64.77 |
|       armchair      | 45.11 | 68.72 |
|         seat        | 48.49 | 69.59 |
|        fence        | 26.61 | 33.76 |
|         desk        | 41.35 | 65.86 |
|         rock        | 49.64 |  74.2 |
|       wardrobe      | 33.82 | 42.33 |
|         lamp        | 58.58 | 74.33 |
|       bathtub       | 78.87 | 84.36 |
|       railing       | 38.16 | 59.21 |
|       cushion       | 59.24 | 77.11 |
|         base        | 23.52 | 33.88 |
|         box         | 25.44 | 28.95 |
|        column       | 49.54 |  66.4 |
|      signboard      | 32.21 | 49.21 |
|   chest of drawers  | 37.97 | 67.29 |
|       counter       | 32.86 |  45.2 |
|         sand        | 50.26 | 87.11 |
|         sink        |  73.8 | 81.88 |
|      skyscraper     | 50.53 | 75.58 |
|      fireplace      |  70.7 | 88.74 |
|     refrigerator    | 71.47 | 93.14 |
|      grandstand     | 10.12 | 12.22 |
|         path        | 18.64 | 29.18 |
|        stairs       | 40.03 | 53.74 |
|        runway       | 75.27 | 88.48 |
|         case        | 37.13 | 49.74 |
|      pool table     | 90.82 | 96.29 |
|        pillow       | 52.49 | 58.78 |
|     screen door     | 77.36 | 82.19 |
|       stairway      | 45.23 | 75.85 |
|        river        |  16.8 | 43.56 |
|        bridge       | 69.48 | 83.59 |
|       bookcase      | 25.99 | 41.66 |
|        blind        | 28.79 | 36.05 |
|     coffee table    | 61.46 | 83.34 |
|        toilet       | 84.72 |  92.1 |
|        flower       | 35.62 | 58.55 |
|         book        | 42.66 | 71.85 |
|         hill        |  7.87 | 12.06 |
|        bench        | 46.73 | 64.51 |
|      countertop     | 56.22 | 70.75 |
|        stove        | 73.85 | 86.25 |
|         palm        | 49.29 | 76.44 |
|    kitchen island   | 45.38 | 91.17 |
|       computer      | 65.52 | 74.72 |
|     swivel chair    | 40.52 | 63.08 |
|         boat        | 63.08 | 79.96 |
|         bar         | 40.38 | 61.65 |
|    arcade machine   | 57.39 | 61.67 |
|        hovel        | 13.17 | 21.93 |
|         bus         | 90.53 |  94.2 |
|        towel        | 71.01 |  85.2 |
|        light        | 40.88 | 52.06 |
|        truck        |  33.1 | 39.58 |
|        tower        | 10.52 | 18.81 |
|      chandelier     | 59.95 |  83.7 |
|        awning       |  29.6 | 42.34 |
|     streetlight     | 23.31 | 34.41 |
|        booth        | 19.16 | 27.55 |
| television receiver |  72.5 | 86.18 |
|       airplane      | 58.81 | 64.45 |
|      dirt track     | 21.38 |  30.3 |
|       apparel       | 43.69 | 63.89 |
|         pole        | 17.41 | 22.24 |
|         land        |  0.0  |  0.0  |
|      bannister      |  6.83 |  8.65 |
|      escalator      | 62.08 | 81.25 |
|       ottoman       | 49.33 | 65.83 |
|        bottle       | 19.99 | 27.08 |
|        buffet       |  44.5 | 66.99 |
|        poster       | 25.17 |  33.9 |
|        stage        |  9.75 | 20.47 |
|         van         |  8.18 |  9.97 |
|         ship        |  0.0  |  0.0  |
|       fountain      | 21.28 | 21.62 |
|    conveyer belt    | 84.46 | 93.08 |
|        canopy       | 42.64 | 48.55 |
|        washer       | 82.39 | 85.81 |
|      plaything      | 33.46 | 66.06 |
|    swimming pool    |  58.5 | 58.58 |
|        stool        | 31.25 | 38.34 |
|        barrel       | 21.99 | 22.35 |
|        basket       | 33.57 | 40.62 |
|      waterfall      | 49.69 | 93.81 |
|         tent        |  0.0  |  0.0  |
|         bag         | 16.87 | 21.48 |
|       minibike      | 71.07 | 87.08 |
|        cradle       | 74.92 | 98.42 |
|         oven        |  47.3 | 61.18 |
|         ball        | 36.49 | 69.78 |
|         food        | 26.24 | 26.74 |
|         step        | 15.11 | 19.35 |
|         tank        | 31.22 |  32.1 |
|      trade name     | 30.55 |  44.5 |
|      microwave      | 78.74 | 89.15 |
|         pot         | 52.56 | 60.16 |
|        animal       | 62.12 | 65.34 |
|       bicycle       |  58.5 | 77.33 |
|         lake        |  0.0  |  0.0  |
|      dishwasher     | 66.74 | 79.38 |
|        screen       | 47.57 | 58.37 |
|       blanket       |  14.8 | 17.04 |
|      sculpture      | 44.81 | 52.18 |
|         hood        | 61.26 | 66.98 |
|        sconce       |  35.8 | 49.37 |
|         vase        | 34.85 | 56.47 |
|    traffic light    | 28.46 | 39.94 |
|         tray        | 10.58 | 30.08 |
|        ashcan       | 44.53 | 57.55 |
|         fan         |  52.0 | 59.69 |
|         pier        | 30.96 | 32.83 |
|      crt screen     | 14.08 | 30.94 |
|        plate        | 53.68 | 69.21 |
|       monitor       |  2.98 |  3.35 |
|    bulletin board   | 33.54 | 45.77 |
|        shower       |  0.0  |  0.0  |
|       radiator      | 64.69 | 70.02 |
|        glass        | 16.08 | 16.93 |
|        clock        | 37.95 | 44.63 |
|         flag        | 67.71 | 72.89 |
+---------------------+-------+-------+
2023-11-09 23:25:12,746 - mmseg - INFO - Summary:
2023-11-09 23:25:12,746 - mmseg - INFO - 
+------+------+-------+
| aAcc | mIoU |  mAcc |
+------+------+-------+
| 82.0 | 46.3 | 59.06 |
+------+------+-------+
2023-11-09 23:25:12,746 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-09 23:25:12,747 - mmseg - INFO - Iter(val) [250]	aAcc: 0.8200, mIoU: 0.4630, mAcc: 0.5906, IoU.wall: 0.7691, IoU.building: 0.8173, IoU.sky: 0.9329, IoU.floor: 0.8089, IoU.tree: 0.7350, IoU.ceiling: 0.8322, IoU.road: 0.8215, IoU.bed : 0.8939, IoU.windowpane: 0.6170, IoU.grass: 0.6371, IoU.cabinet: 0.6066, IoU.sidewalk: 0.6393, IoU.person: 0.7875, IoU.earth: 0.3428, IoU.door: 0.5304, IoU.table: 0.6092, IoU.mountain: 0.5250, IoU.plant: 0.5055, IoU.curtain: 0.7288, IoU.chair: 0.5318, IoU.car: 0.8048, IoU.water: 0.5104, IoU.painting: 0.7261, IoU.sofa: 0.6955, IoU.shelf: 0.3457, IoU.house: 0.3485, IoU.sea: 0.5304, IoU.mirror: 0.7060, IoU.rug: 0.6614, IoU.field: 0.3309, IoU.armchair: 0.4511, IoU.seat: 0.4849, IoU.fence: 0.2661, IoU.desk: 0.4135, IoU.rock: 0.4964, IoU.wardrobe: 0.3382, IoU.lamp: 0.5858, IoU.bathtub: 0.7887, IoU.railing: 0.3816, IoU.cushion: 0.5924, IoU.base: 0.2352, IoU.box: 0.2544, IoU.column: 0.4954, IoU.signboard: 0.3221, IoU.chest of drawers: 0.3797, IoU.counter: 0.3286, IoU.sand: 0.5026, IoU.sink: 0.7380, IoU.skyscraper: 0.5053, IoU.fireplace: 0.7070, IoU.refrigerator: 0.7147, IoU.grandstand: 0.1012, IoU.path: 0.1864, IoU.stairs: 0.4003, IoU.runway: 0.7527, IoU.case: 0.3713, IoU.pool table: 0.9082, IoU.pillow: 0.5249, IoU.screen door: 0.7736, IoU.stairway: 0.4523, IoU.river: 0.1680, IoU.bridge: 0.6948, IoU.bookcase: 0.2599, IoU.blind: 0.2879, IoU.coffee table: 0.6146, IoU.toilet: 0.8472, IoU.flower: 0.3562, IoU.book: 0.4266, IoU.hill: 0.0787, IoU.bench: 0.4673, IoU.countertop: 0.5622, IoU.stove: 0.7385, IoU.palm: 0.4929, IoU.kitchen island: 0.4538, IoU.computer: 0.6552, IoU.swivel chair: 0.4052, IoU.boat: 0.6308, IoU.bar: 0.4038, IoU.arcade machine: 0.5739, IoU.hovel: 0.1317, IoU.bus: 0.9053, IoU.towel: 0.7101, IoU.light: 0.4088, IoU.truck: 0.3310, IoU.tower: 0.1052, IoU.chandelier: 0.5995, IoU.awning: 0.2960, IoU.streetlight: 0.2331, IoU.booth: 0.1916, IoU.television receiver: 0.7250, IoU.airplane: 0.5881, IoU.dirt track: 0.2138, IoU.apparel: 0.4369, IoU.pole: 0.1741, IoU.land: 0.0000, IoU.bannister: 0.0683, IoU.escalator: 0.6208, IoU.ottoman: 0.4933, IoU.bottle: 0.1999, IoU.buffet: 0.4450, IoU.poster: 0.2517, IoU.stage: 0.0975, IoU.van: 0.0818, IoU.ship: 0.0000, IoU.fountain: 0.2128, IoU.conveyer belt: 0.8446, IoU.canopy: 0.4264, IoU.washer: 0.8239, IoU.plaything: 0.3346, IoU.swimming pool: 0.5850, IoU.stool: 0.3125, IoU.barrel: 0.2199, IoU.basket: 0.3357, IoU.waterfall: 0.4969, IoU.tent: 0.0000, IoU.bag: 0.1687, IoU.minibike: 0.7107, IoU.cradle: 0.7492, IoU.oven: 0.4730, IoU.ball: 0.3649, IoU.food: 0.2624, IoU.step: 0.1511, IoU.tank: 0.3122, IoU.trade name: 0.3055, IoU.microwave: 0.7874, IoU.pot: 0.5256, IoU.animal: 0.6212, IoU.bicycle: 0.5850, IoU.lake: 0.0000, IoU.dishwasher: 0.6674, IoU.screen: 0.4757, IoU.blanket: 0.1480, IoU.sculpture: 0.4481, IoU.hood: 0.6126, IoU.sconce: 0.3580, IoU.vase: 0.3485, IoU.traffic light: 0.2846, IoU.tray: 0.1058, IoU.ashcan: 0.4453, IoU.fan: 0.5200, IoU.pier: 0.3096, IoU.crt screen: 0.1408, IoU.plate: 0.5368, IoU.monitor: 0.0298, IoU.bulletin board: 0.3354, IoU.shower: 0.0000, IoU.radiator: 0.6469, IoU.glass: 0.1608, IoU.clock: 0.3795, IoU.flag: 0.6771, Acc.wall: 0.8722, Acc.building: 0.9047, Acc.sky: 0.9697, Acc.floor: 0.8787, Acc.tree: 0.8830, Acc.ceiling: 0.9368, Acc.road: 0.8846, Acc.bed : 0.9578, Acc.windowpane: 0.7879, Acc.grass: 0.7976, Acc.cabinet: 0.6941, Acc.sidewalk: 0.8123, Acc.person: 0.9359, Acc.earth: 0.4750, Acc.door: 0.6449, Acc.table: 0.7899, Acc.mountain: 0.6226, Acc.plant: 0.6172, Acc.curtain: 0.8723, Acc.chair: 0.6742, Acc.car: 0.9352, Acc.water: 0.6749, Acc.painting: 0.8830, Acc.sofa: 0.8704, Acc.shelf: 0.5673, Acc.house: 0.4925, Acc.sea: 0.6883, Acc.mirror: 0.7869, Acc.rug: 0.8118, Acc.field: 0.6477, Acc.armchair: 0.6872, Acc.seat: 0.6959, Acc.fence: 0.3376, Acc.desk: 0.6586, Acc.rock: 0.7420, Acc.wardrobe: 0.4233, Acc.lamp: 0.7433, Acc.bathtub: 0.8436, Acc.railing: 0.5921, Acc.cushion: 0.7711, Acc.base: 0.3388, Acc.box: 0.2895, Acc.column: 0.6640, Acc.signboard: 0.4921, Acc.chest of drawers: 0.6729, Acc.counter: 0.4520, Acc.sand: 0.8711, Acc.sink: 0.8188, Acc.skyscraper: 0.7558, Acc.fireplace: 0.8874, Acc.refrigerator: 0.9314, Acc.grandstand: 0.1222, Acc.path: 0.2918, Acc.stairs: 0.5374, Acc.runway: 0.8848, Acc.case: 0.4974, Acc.pool table: 0.9629, Acc.pillow: 0.5878, Acc.screen door: 0.8219, Acc.stairway: 0.7585, Acc.river: 0.4356, Acc.bridge: 0.8359, Acc.bookcase: 0.4166, Acc.blind: 0.3605, Acc.coffee table: 0.8334, Acc.toilet: 0.9210, Acc.flower: 0.5855, Acc.book: 0.7185, Acc.hill: 0.1206, Acc.bench: 0.6451, Acc.countertop: 0.7075, Acc.stove: 0.8625, Acc.palm: 0.7644, Acc.kitchen island: 0.9117, Acc.computer: 0.7472, Acc.swivel chair: 0.6308, Acc.boat: 0.7996, Acc.bar: 0.6165, Acc.arcade machine: 0.6167, Acc.hovel: 0.2193, Acc.bus: 0.9420, Acc.towel: 0.8520, Acc.light: 0.5206, Acc.truck: 0.3958, Acc.tower: 0.1881, Acc.chandelier: 0.8370, Acc.awning: 0.4234, Acc.streetlight: 0.3441, Acc.booth: 0.2755, Acc.television receiver: 0.8618, Acc.airplane: 0.6445, Acc.dirt track: 0.3030, Acc.apparel: 0.6389, Acc.pole: 0.2224, Acc.land: 0.0000, Acc.bannister: 0.0865, Acc.escalator: 0.8125, Acc.ottoman: 0.6583, Acc.bottle: 0.2708, Acc.buffet: 0.6699, Acc.poster: 0.3390, Acc.stage: 0.2047, Acc.van: 0.0997, Acc.ship: 0.0000, Acc.fountain: 0.2162, Acc.conveyer belt: 0.9308, Acc.canopy: 0.4855, Acc.washer: 0.8581, Acc.plaything: 0.6606, Acc.swimming pool: 0.5858, Acc.stool: 0.3834, Acc.barrel: 0.2235, Acc.basket: 0.4062, Acc.waterfall: 0.9381, Acc.tent: 0.0000, Acc.bag: 0.2148, Acc.minibike: 0.8708, Acc.cradle: 0.9842, Acc.oven: 0.6118, Acc.ball: 0.6978, Acc.food: 0.2674, Acc.step: 0.1935, Acc.tank: 0.3210, Acc.trade name: 0.4450, Acc.microwave: 0.8915, Acc.pot: 0.6016, Acc.animal: 0.6534, Acc.bicycle: 0.7733, Acc.lake: 0.0000, Acc.dishwasher: 0.7938, Acc.screen: 0.5837, Acc.blanket: 0.1704, Acc.sculpture: 0.5218, Acc.hood: 0.6698, Acc.sconce: 0.4937, Acc.vase: 0.5647, Acc.traffic light: 0.3994, Acc.tray: 0.3008, Acc.ashcan: 0.5755, Acc.fan: 0.5969, Acc.pier: 0.3283, Acc.crt screen: 0.3094, Acc.plate: 0.6921, Acc.monitor: 0.0335, Acc.bulletin board: 0.4577, Acc.shower: 0.0000, Acc.radiator: 0.7002, Acc.glass: 0.1693, Acc.clock: 0.4463, Acc.flag: 0.7289
2023-11-09 23:26:14,065 - mmseg - INFO - Iter [2050/5000]	lr: 1.912e-06, eta: 1:09:04, time: 2.304, data_time: 1.086, memory: 38534, decode.loss_ce: 0.2618, decode.acc_seg: 90.1254, loss: 0.2618
2023-11-09 23:27:17,510 - mmseg - INFO - Iter [2100/5000]	lr: 1.880e-06, eta: 1:07:44, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2601, decode.acc_seg: 90.3283, loss: 0.2601
2023-11-09 23:28:20,961 - mmseg - INFO - Iter [2150/5000]	lr: 1.847e-06, eta: 1:06:25, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2471, decode.acc_seg: 90.5986, loss: 0.2471
2023-11-09 23:29:22,193 - mmseg - INFO - Iter [2200/5000]	lr: 1.815e-06, eta: 1:05:04, time: 1.225, data_time: 0.007, memory: 38534, decode.loss_ce: 0.2362, decode.acc_seg: 91.1553, loss: 0.2362
2023-11-09 23:30:25,808 - mmseg - INFO - Iter [2250/5000]	lr: 1.783e-06, eta: 1:03:47, time: 1.272, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2422, decode.acc_seg: 90.7167, loss: 0.2422
2023-11-09 23:31:29,457 - mmseg - INFO - Iter [2300/5000]	lr: 1.750e-06, eta: 1:02:30, time: 1.273, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2298, decode.acc_seg: 91.4812, loss: 0.2298
2023-11-09 23:32:30,666 - mmseg - INFO - Iter [2350/5000]	lr: 1.718e-06, eta: 1:01:12, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2345, decode.acc_seg: 91.3302, loss: 0.2345
2023-11-09 23:33:34,164 - mmseg - INFO - Iter [2400/5000]	lr: 1.685e-06, eta: 0:59:56, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2363, decode.acc_seg: 90.7356, loss: 0.2363
2023-11-09 23:34:37,644 - mmseg - INFO - Iter [2450/5000]	lr: 1.653e-06, eta: 0:58:41, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2268, decode.acc_seg: 91.3783, loss: 0.2268
2023-11-09 23:35:38,848 - mmseg - INFO - Iter [2500/5000]	lr: 1.621e-06, eta: 0:57:24, time: 1.224, data_time: 0.007, memory: 38534, decode.loss_ce: 0.2271, decode.acc_seg: 91.2350, loss: 0.2271
2023-11-09 23:36:42,354 - mmseg - INFO - Iter [2550/5000]	lr: 1.588e-06, eta: 0:56:10, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2304, decode.acc_seg: 91.0821, loss: 0.2304
2023-11-09 23:37:43,577 - mmseg - INFO - Iter [2600/5000]	lr: 1.556e-06, eta: 0:54:54, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2159, decode.acc_seg: 91.6075, loss: 0.2159
2023-11-09 23:38:47,118 - mmseg - INFO - Iter [2650/5000]	lr: 1.523e-06, eta: 0:53:41, time: 1.271, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2273, decode.acc_seg: 91.4158, loss: 0.2273
2023-11-09 23:39:50,639 - mmseg - INFO - Iter [2700/5000]	lr: 1.491e-06, eta: 0:52:28, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2124, decode.acc_seg: 91.7559, loss: 0.2124
2023-11-09 23:40:51,911 - mmseg - INFO - Iter [2750/5000]	lr: 1.459e-06, eta: 0:51:14, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2102, decode.acc_seg: 91.7584, loss: 0.2102
2023-11-09 23:41:55,434 - mmseg - INFO - Iter [2800/5000]	lr: 1.426e-06, eta: 0:50:02, time: 1.270, data_time: 0.054, memory: 38534, decode.loss_ce: 0.1992, decode.acc_seg: 92.1451, loss: 0.1992
2023-11-09 23:42:59,091 - mmseg - INFO - Iter [2850/5000]	lr: 1.394e-06, eta: 0:48:50, time: 1.273, data_time: 0.056, memory: 38534, decode.loss_ce: 0.2105, decode.acc_seg: 91.7742, loss: 0.2105
2023-11-09 23:44:00,312 - mmseg - INFO - Iter [2900/5000]	lr: 1.361e-06, eta: 0:47:37, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1973, decode.acc_seg: 92.2418, loss: 0.1973
2023-11-09 23:45:03,839 - mmseg - INFO - Iter [2950/5000]	lr: 1.329e-06, eta: 0:46:26, time: 1.271, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2211, decode.acc_seg: 91.5542, loss: 0.2211
2023-11-09 23:46:05,055 - mmseg - INFO - Saving checkpoint at 3000 iterations
2023-11-09 23:47:00,366 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-09 23:47:00,366 - mmseg - INFO - Iter [3000/5000]	lr: 1.297e-06, eta: 0:45:50, time: 2.331, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2134, decode.acc_seg: 91.7209, loss: 0.2134
2023-11-09 23:47:54,446 - mmseg - INFO - per class results:
2023-11-09 23:47:54,451 - mmseg - INFO - 
+---------------------+-------+-------+
|        Class        |  IoU  |  Acc  |
+---------------------+-------+-------+
|         wall        | 77.77 |  88.6 |
|       building      | 81.68 | 91.84 |
|         sky         | 93.44 | 96.68 |
|        floor        | 81.56 |  90.0 |
|         tree        | 73.73 | 89.27 |
|       ceiling       | 83.61 | 91.82 |
|         road        | 82.56 | 90.12 |
|         bed         | 89.86 | 95.55 |
|      windowpane     | 61.82 | 78.83 |
|        grass        | 62.86 | 79.61 |
|       cabinet       | 61.24 | 74.94 |
|       sidewalk      | 63.65 | 78.74 |
|        person       | 79.93 | 93.71 |
|        earth        | 35.31 | 50.13 |
|         door        | 53.57 | 67.87 |
|        table        | 62.21 | 76.55 |
|       mountain      |  53.2 | 62.72 |
|        plant        | 49.51 | 59.38 |
|       curtain       | 73.07 | 87.89 |
|        chair        | 52.97 | 64.93 |
|         car         | 80.31 | 93.88 |
|        water        | 48.36 | 64.55 |
|       painting      |  75.1 | 86.32 |
|         sofa        |  68.1 | 89.98 |
|        shelf        | 31.59 | 48.13 |
|        house        | 27.26 | 36.89 |
|         sea         | 51.83 | 69.43 |
|        mirror       | 69.25 | 74.29 |
|         rug         | 65.48 |  74.5 |
|        field        | 33.01 | 62.94 |
|       armchair      | 44.93 | 66.16 |
|         seat        | 47.34 | 66.61 |
|        fence        | 30.07 | 38.84 |
|         desk        | 43.81 |  65.4 |
|         rock        | 53.83 | 75.83 |
|       wardrobe      | 34.81 | 45.52 |
|         lamp        | 61.68 | 76.59 |
|       bathtub       | 79.23 | 85.27 |
|       railing       | 37.33 | 51.95 |
|       cushion       | 58.63 | 70.32 |
|         base        | 25.08 | 39.38 |
|         box         | 28.92 | 42.31 |
|        column       | 48.96 | 63.12 |
|      signboard      | 30.96 | 41.68 |
|   chest of drawers  | 39.07 | 63.91 |
|       counter       |  29.4 | 37.98 |
|         sand        | 56.94 | 86.23 |
|         sink        | 73.81 | 81.87 |
|      skyscraper     | 46.49 | 61.11 |
|      fireplace      | 70.41 |  84.3 |
|     refrigerator    | 75.25 |  85.6 |
|      grandstand     |  8.34 | 10.72 |
|         path        | 16.01 | 24.66 |
|        stairs       |  35.4 | 47.85 |
|        runway       | 76.96 | 89.27 |
|         case        | 35.11 | 48.56 |
|      pool table     | 91.48 | 96.69 |
|        pillow       | 59.91 | 71.45 |
|     screen door     |  61.6 | 62.84 |
|       stairway      |  49.8 | 72.54 |
|        river        | 17.83 | 52.76 |
|        bridge       | 65.21 | 84.71 |
|       bookcase      | 30.79 | 53.15 |
|        blind        | 15.71 | 17.56 |
|     coffee table    | 58.71 | 86.19 |
|        toilet       |  86.0 |  90.8 |
|        flower       | 35.29 | 54.21 |
|         book        | 40.81 | 70.62 |
|         hill        |  7.46 |  8.03 |
|        bench        | 50.47 | 59.82 |
|      countertop     | 56.67 | 71.37 |
|        stove        | 71.44 |  86.4 |
|         palm        | 49.96 | 73.35 |
|    kitchen island   |  40.5 | 72.34 |
|       computer      | 65.93 | 76.36 |
|     swivel chair    | 38.84 | 62.17 |
|         boat        | 68.02 | 84.88 |
|         bar         | 32.57 | 49.53 |
|    arcade machine   | 58.67 | 64.29 |
|        hovel        | 16.84 | 21.97 |
|         bus         | 90.01 | 94.45 |
|        towel        |  71.8 | 80.66 |
|        light        | 43.06 | 54.06 |
|        truck        | 38.33 | 47.25 |
|        tower        |  9.81 | 17.37 |
|      chandelier     | 63.64 | 77.93 |
|        awning       | 26.71 | 34.26 |
|     streetlight     | 25.05 | 36.24 |
|        booth        | 15.39 | 16.85 |
| television receiver |  73.8 | 85.04 |
|       airplane      | 58.66 | 66.79 |
|      dirt track     | 12.27 | 33.07 |
|       apparel       | 41.22 | 64.89 |
|         pole        | 17.91 |  22.9 |
|         land        |  0.02 |  0.03 |
|      bannister      |  9.33 |  13.8 |
|      escalator      | 57.35 | 77.99 |
|       ottoman       | 47.16 | 61.93 |
|        bottle       | 23.23 |  31.1 |
|        buffet       | 38.94 |  53.9 |
|        poster       | 27.84 | 31.34 |
|        stage        |  9.4  | 19.53 |
|         van         |  8.67 | 10.14 |
|         ship        |  0.0  |  0.0  |
|       fountain      |  12.2 | 12.37 |
|    conveyer belt    | 77.34 | 95.08 |
|        canopy       | 41.83 | 49.29 |
|        washer       | 79.24 | 81.36 |
|      plaything      | 31.37 |  39.3 |
|    swimming pool    | 55.04 | 55.38 |
|        stool        | 37.72 | 48.18 |
|        barrel       | 26.38 |  27.1 |
|        basket       | 36.26 | 52.61 |
|      waterfall      | 46.41 | 72.67 |
|         tent        |  0.0  |  0.0  |
|         bag         | 18.86 |  20.9 |
|       minibike      | 70.91 | 84.71 |
|        cradle       | 77.38 | 96.25 |
|         oven        | 35.89 | 44.97 |
|         ball        | 37.57 | 68.99 |
|         food        | 19.96 | 20.37 |
|         step        | 11.48 | 17.34 |
|         tank        | 29.74 | 31.99 |
|      trade name     | 30.73 | 45.18 |
|      microwave      | 78.98 |  86.7 |
|         pot         | 50.02 | 56.43 |
|        animal       | 62.94 | 64.83 |
|       bicycle       | 59.55 | 79.24 |
|         lake        |  0.0  |  0.0  |
|      dishwasher     | 65.65 | 77.01 |
|        screen       | 40.61 | 43.76 |
|       blanket       | 14.87 | 17.11 |
|      sculpture      | 43.83 | 47.96 |
|         hood        | 53.91 | 63.84 |
|        sconce       | 34.28 | 45.16 |
|         vase        | 36.27 | 51.52 |
|    traffic light    | 28.85 | 42.82 |
|         tray        |  11.1 | 31.72 |
|        ashcan       | 46.83 | 57.83 |
|         fan         | 52.13 |  60.4 |
|         pier        | 29.45 | 31.33 |
|      crt screen     | 16.87 | 53.46 |
|        plate        | 55.55 | 74.13 |
|       monitor       |  3.22 |  3.99 |
|    bulletin board   | 46.87 | 72.94 |
|        shower       |  0.0  |  0.0  |
|       radiator      |  64.9 | 69.13 |
|        glass        | 19.11 | 20.56 |
|        clock        | 35.48 | 39.82 |
|         flag        | 66.22 | 70.11 |
+---------------------+-------+-------+
2023-11-09 23:47:54,452 - mmseg - INFO - Summary:
2023-11-09 23:47:54,452 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 82.18 | 45.91 | 57.75 |
+-------+-------+-------+
2023-11-09 23:47:54,452 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-09 23:47:54,453 - mmseg - INFO - Iter(val) [250]	aAcc: 0.8218, mIoU: 0.4591, mAcc: 0.5775, IoU.wall: 0.7777, IoU.building: 0.8168, IoU.sky: 0.9344, IoU.floor: 0.8156, IoU.tree: 0.7373, IoU.ceiling: 0.8361, IoU.road: 0.8256, IoU.bed : 0.8986, IoU.windowpane: 0.6182, IoU.grass: 0.6286, IoU.cabinet: 0.6124, IoU.sidewalk: 0.6365, IoU.person: 0.7993, IoU.earth: 0.3531, IoU.door: 0.5357, IoU.table: 0.6221, IoU.mountain: 0.5320, IoU.plant: 0.4951, IoU.curtain: 0.7307, IoU.chair: 0.5297, IoU.car: 0.8031, IoU.water: 0.4836, IoU.painting: 0.7510, IoU.sofa: 0.6810, IoU.shelf: 0.3159, IoU.house: 0.2726, IoU.sea: 0.5183, IoU.mirror: 0.6925, IoU.rug: 0.6548, IoU.field: 0.3301, IoU.armchair: 0.4493, IoU.seat: 0.4734, IoU.fence: 0.3007, IoU.desk: 0.4381, IoU.rock: 0.5383, IoU.wardrobe: 0.3481, IoU.lamp: 0.6168, IoU.bathtub: 0.7923, IoU.railing: 0.3733, IoU.cushion: 0.5863, IoU.base: 0.2508, IoU.box: 0.2892, IoU.column: 0.4896, IoU.signboard: 0.3096, IoU.chest of drawers: 0.3907, IoU.counter: 0.2940, IoU.sand: 0.5694, IoU.sink: 0.7381, IoU.skyscraper: 0.4649, IoU.fireplace: 0.7041, IoU.refrigerator: 0.7525, IoU.grandstand: 0.0834, IoU.path: 0.1601, IoU.stairs: 0.3540, IoU.runway: 0.7696, IoU.case: 0.3511, IoU.pool table: 0.9148, IoU.pillow: 0.5991, IoU.screen door: 0.6160, IoU.stairway: 0.4980, IoU.river: 0.1783, IoU.bridge: 0.6521, IoU.bookcase: 0.3079, IoU.blind: 0.1571, IoU.coffee table: 0.5871, IoU.toilet: 0.8600, IoU.flower: 0.3529, IoU.book: 0.4081, IoU.hill: 0.0746, IoU.bench: 0.5047, IoU.countertop: 0.5667, IoU.stove: 0.7144, IoU.palm: 0.4996, IoU.kitchen island: 0.4050, IoU.computer: 0.6593, IoU.swivel chair: 0.3884, IoU.boat: 0.6802, IoU.bar: 0.3257, IoU.arcade machine: 0.5867, IoU.hovel: 0.1684, IoU.bus: 0.9001, IoU.towel: 0.7180, IoU.light: 0.4306, IoU.truck: 0.3833, IoU.tower: 0.0981, IoU.chandelier: 0.6364, IoU.awning: 0.2671, IoU.streetlight: 0.2505, IoU.booth: 0.1539, IoU.television receiver: 0.7380, IoU.airplane: 0.5866, IoU.dirt track: 0.1227, IoU.apparel: 0.4122, IoU.pole: 0.1791, IoU.land: 0.0002, IoU.bannister: 0.0933, IoU.escalator: 0.5735, IoU.ottoman: 0.4716, IoU.bottle: 0.2323, IoU.buffet: 0.3894, IoU.poster: 0.2784, IoU.stage: 0.0940, IoU.van: 0.0867, IoU.ship: 0.0000, IoU.fountain: 0.1220, IoU.conveyer belt: 0.7734, IoU.canopy: 0.4183, IoU.washer: 0.7924, IoU.plaything: 0.3137, IoU.swimming pool: 0.5504, IoU.stool: 0.3772, IoU.barrel: 0.2638, IoU.basket: 0.3626, IoU.waterfall: 0.4641, IoU.tent: 0.0000, IoU.bag: 0.1886, IoU.minibike: 0.7091, IoU.cradle: 0.7738, IoU.oven: 0.3589, IoU.ball: 0.3757, IoU.food: 0.1996, IoU.step: 0.1148, IoU.tank: 0.2974, IoU.trade name: 0.3073, IoU.microwave: 0.7898, IoU.pot: 0.5002, IoU.animal: 0.6294, IoU.bicycle: 0.5955, IoU.lake: 0.0000, IoU.dishwasher: 0.6565, IoU.screen: 0.4061, IoU.blanket: 0.1487, IoU.sculpture: 0.4383, IoU.hood: 0.5391, IoU.sconce: 0.3428, IoU.vase: 0.3627, IoU.traffic light: 0.2885, IoU.tray: 0.1110, IoU.ashcan: 0.4683, IoU.fan: 0.5213, IoU.pier: 0.2945, IoU.crt screen: 0.1687, IoU.plate: 0.5555, IoU.monitor: 0.0322, IoU.bulletin board: 0.4687, IoU.shower: 0.0000, IoU.radiator: 0.6490, IoU.glass: 0.1911, IoU.clock: 0.3548, IoU.flag: 0.6622, Acc.wall: 0.8860, Acc.building: 0.9184, Acc.sky: 0.9668, Acc.floor: 0.9000, Acc.tree: 0.8927, Acc.ceiling: 0.9182, Acc.road: 0.9012, Acc.bed : 0.9555, Acc.windowpane: 0.7883, Acc.grass: 0.7961, Acc.cabinet: 0.7494, Acc.sidewalk: 0.7874, Acc.person: 0.9371, Acc.earth: 0.5013, Acc.door: 0.6787, Acc.table: 0.7655, Acc.mountain: 0.6272, Acc.plant: 0.5938, Acc.curtain: 0.8789, Acc.chair: 0.6493, Acc.car: 0.9388, Acc.water: 0.6455, Acc.painting: 0.8632, Acc.sofa: 0.8998, Acc.shelf: 0.4813, Acc.house: 0.3689, Acc.sea: 0.6943, Acc.mirror: 0.7429, Acc.rug: 0.7450, Acc.field: 0.6294, Acc.armchair: 0.6616, Acc.seat: 0.6661, Acc.fence: 0.3884, Acc.desk: 0.6540, Acc.rock: 0.7583, Acc.wardrobe: 0.4552, Acc.lamp: 0.7659, Acc.bathtub: 0.8527, Acc.railing: 0.5195, Acc.cushion: 0.7032, Acc.base: 0.3938, Acc.box: 0.4231, Acc.column: 0.6312, Acc.signboard: 0.4168, Acc.chest of drawers: 0.6391, Acc.counter: 0.3798, Acc.sand: 0.8623, Acc.sink: 0.8187, Acc.skyscraper: 0.6111, Acc.fireplace: 0.8430, Acc.refrigerator: 0.8560, Acc.grandstand: 0.1072, Acc.path: 0.2466, Acc.stairs: 0.4785, Acc.runway: 0.8927, Acc.case: 0.4856, Acc.pool table: 0.9669, Acc.pillow: 0.7145, Acc.screen door: 0.6284, Acc.stairway: 0.7254, Acc.river: 0.5276, Acc.bridge: 0.8471, Acc.bookcase: 0.5315, Acc.blind: 0.1756, Acc.coffee table: 0.8619, Acc.toilet: 0.9080, Acc.flower: 0.5421, Acc.book: 0.7062, Acc.hill: 0.0803, Acc.bench: 0.5982, Acc.countertop: 0.7137, Acc.stove: 0.8640, Acc.palm: 0.7335, Acc.kitchen island: 0.7234, Acc.computer: 0.7636, Acc.swivel chair: 0.6217, Acc.boat: 0.8488, Acc.bar: 0.4953, Acc.arcade machine: 0.6429, Acc.hovel: 0.2197, Acc.bus: 0.9445, Acc.towel: 0.8066, Acc.light: 0.5406, Acc.truck: 0.4725, Acc.tower: 0.1737, Acc.chandelier: 0.7793, Acc.awning: 0.3426, Acc.streetlight: 0.3624, Acc.booth: 0.1685, Acc.television receiver: 0.8504, Acc.airplane: 0.6679, Acc.dirt track: 0.3307, Acc.apparel: 0.6489, Acc.pole: 0.2290, Acc.land: 0.0003, Acc.bannister: 0.1380, Acc.escalator: 0.7799, Acc.ottoman: 0.6193, Acc.bottle: 0.3110, Acc.buffet: 0.5390, Acc.poster: 0.3134, Acc.stage: 0.1953, Acc.van: 0.1014, Acc.ship: 0.0000, Acc.fountain: 0.1237, Acc.conveyer belt: 0.9508, Acc.canopy: 0.4929, Acc.washer: 0.8136, Acc.plaything: 0.3930, Acc.swimming pool: 0.5538, Acc.stool: 0.4818, Acc.barrel: 0.2710, Acc.basket: 0.5261, Acc.waterfall: 0.7267, Acc.tent: 0.0000, Acc.bag: 0.2090, Acc.minibike: 0.8471, Acc.cradle: 0.9625, Acc.oven: 0.4497, Acc.ball: 0.6899, Acc.food: 0.2037, Acc.step: 0.1734, Acc.tank: 0.3199, Acc.trade name: 0.4518, Acc.microwave: 0.8670, Acc.pot: 0.5643, Acc.animal: 0.6483, Acc.bicycle: 0.7924, Acc.lake: 0.0000, Acc.dishwasher: 0.7701, Acc.screen: 0.4376, Acc.blanket: 0.1711, Acc.sculpture: 0.4796, Acc.hood: 0.6384, Acc.sconce: 0.4516, Acc.vase: 0.5152, Acc.traffic light: 0.4282, Acc.tray: 0.3172, Acc.ashcan: 0.5783, Acc.fan: 0.6040, Acc.pier: 0.3133, Acc.crt screen: 0.5346, Acc.plate: 0.7413, Acc.monitor: 0.0399, Acc.bulletin board: 0.7294, Acc.shower: 0.0000, Acc.radiator: 0.6913, Acc.glass: 0.2056, Acc.clock: 0.3982, Acc.flag: 0.7011
2023-11-09 23:48:57,960 - mmseg - INFO - Iter [3050/5000]	lr: 1.264e-06, eta: 0:45:13, time: 2.352, data_time: 1.135, memory: 38534, decode.loss_ce: 0.2142, decode.acc_seg: 91.8498, loss: 0.2142
2023-11-09 23:50:01,339 - mmseg - INFO - Iter [3100/5000]	lr: 1.232e-06, eta: 0:43:59, time: 1.268, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1957, decode.acc_seg: 92.3266, loss: 0.1957
2023-11-09 23:51:02,494 - mmseg - INFO - Iter [3150/5000]	lr: 1.199e-06, eta: 0:42:45, time: 1.223, data_time: 0.007, memory: 38534, decode.loss_ce: 0.1951, decode.acc_seg: 92.3087, loss: 0.1951
2023-11-09 23:52:05,848 - mmseg - INFO - Iter [3200/5000]	lr: 1.167e-06, eta: 0:41:32, time: 1.267, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1983, decode.acc_seg: 92.1063, loss: 0.1983
2023-11-09 23:53:09,286 - mmseg - INFO - Iter [3250/5000]	lr: 1.135e-06, eta: 0:40:20, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2071, decode.acc_seg: 92.0398, loss: 0.2071
2023-11-09 23:54:10,445 - mmseg - INFO - Iter [3300/5000]	lr: 1.102e-06, eta: 0:39:07, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2000, decode.acc_seg: 92.1843, loss: 0.2000
2023-11-09 23:55:13,892 - mmseg - INFO - Iter [3350/5000]	lr: 1.070e-06, eta: 0:37:55, time: 1.269, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1936, decode.acc_seg: 92.3446, loss: 0.1936
2023-11-09 23:56:17,393 - mmseg - INFO - Iter [3400/5000]	lr: 1.037e-06, eta: 0:36:43, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2068, decode.acc_seg: 91.9792, loss: 0.2068
2023-11-09 23:57:18,556 - mmseg - INFO - Iter [3450/5000]	lr: 1.005e-06, eta: 0:35:31, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1995, decode.acc_seg: 92.1372, loss: 0.1995
2023-11-09 23:58:22,025 - mmseg - INFO - Iter [3500/5000]	lr: 9.726e-07, eta: 0:34:20, time: 1.269, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1895, decode.acc_seg: 92.4622, loss: 0.1895
2023-11-09 23:59:23,222 - mmseg - INFO - Iter [3550/5000]	lr: 9.402e-07, eta: 0:33:08, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1790, decode.acc_seg: 93.0010, loss: 0.1790
2023-11-10 00:00:26,655 - mmseg - INFO - Iter [3600/5000]	lr: 9.078e-07, eta: 0:31:58, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1914, decode.acc_seg: 92.4182, loss: 0.1914
2023-11-10 00:01:30,191 - mmseg - INFO - Iter [3650/5000]	lr: 8.754e-07, eta: 0:30:47, time: 1.271, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1913, decode.acc_seg: 92.4978, loss: 0.1913
2023-11-10 00:02:31,392 - mmseg - INFO - Iter [3700/5000]	lr: 8.430e-07, eta: 0:29:36, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1807, decode.acc_seg: 92.6843, loss: 0.1807
2023-11-10 00:03:34,834 - mmseg - INFO - Iter [3750/5000]	lr: 8.106e-07, eta: 0:28:26, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1805, decode.acc_seg: 92.7099, loss: 0.1805
2023-11-10 00:04:38,226 - mmseg - INFO - Iter [3800/5000]	lr: 7.782e-07, eta: 0:27:17, time: 1.268, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1872, decode.acc_seg: 92.5187, loss: 0.1872
2023-11-10 00:05:39,417 - mmseg - INFO - Iter [3850/5000]	lr: 7.458e-07, eta: 0:26:06, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1864, decode.acc_seg: 92.6280, loss: 0.1864
2023-11-10 00:06:42,957 - mmseg - INFO - Iter [3900/5000]	lr: 7.134e-07, eta: 0:24:57, time: 1.271, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1801, decode.acc_seg: 92.8809, loss: 0.1801
2023-11-10 00:07:44,133 - mmseg - INFO - Iter [3950/5000]	lr: 6.810e-07, eta: 0:23:47, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1828, decode.acc_seg: 92.8699, loss: 0.1828
2023-11-10 00:08:47,704 - mmseg - INFO - Saving checkpoint at 4000 iterations
2023-11-10 00:09:38,103 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-10 00:09:38,103 - mmseg - INFO - Iter [4000/5000]	lr: 6.486e-07, eta: 0:22:50, time: 2.279, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1766, decode.acc_seg: 92.9688, loss: 0.1766
2023-11-10 00:10:33,057 - mmseg - INFO - per class results:
2023-11-10 00:10:33,062 - mmseg - INFO - 
+---------------------+-------+-------+
|        Class        |  IoU  |  Acc  |
+---------------------+-------+-------+
|         wall        | 77.45 | 88.71 |
|       building      | 81.27 | 90.93 |
|         sky         | 93.36 |  97.4 |
|        floor        | 81.75 | 91.13 |
|         tree        | 73.59 | 86.66 |
|       ceiling       |  83.5 | 93.01 |
|         road        | 83.13 | 89.93 |
|         bed         | 89.97 |  95.3 |
|      windowpane     | 62.45 | 78.57 |
|        grass        | 61.33 | 77.26 |
|       cabinet       | 61.96 | 74.23 |
|       sidewalk      | 63.49 | 81.13 |
|        person       | 80.01 | 93.67 |
|        earth        | 35.02 | 46.42 |
|         door        | 52.38 | 63.18 |
|        table        | 62.61 | 76.96 |
|       mountain      | 53.82 |  65.1 |
|        plant        | 48.25 | 57.76 |
|       curtain       |  72.5 |  88.6 |
|        chair        | 55.31 | 71.15 |
|         car         | 81.12 | 93.59 |
|        water        | 49.77 | 65.16 |
|       painting      | 75.11 | 86.38 |
|         sofa        | 68.56 | 90.27 |
|        shelf        | 33.24 | 52.01 |
|        house        | 24.96 | 31.92 |
|         sea         | 53.58 | 71.11 |
|        mirror       | 70.15 | 76.16 |
|         rug         | 61.82 | 67.37 |
|        field        | 31.16 | 66.51 |
|       armchair      | 45.73 | 59.33 |
|         seat        | 48.83 | 69.32 |
|        fence        | 30.64 |  43.0 |
|         desk        |  45.6 | 60.51 |
|         rock        | 52.36 | 76.55 |
|       wardrobe      | 36.31 | 51.05 |
|         lamp        | 62.26 | 76.05 |
|       bathtub       | 80.02 | 84.35 |
|       railing       | 37.46 | 52.75 |
|       cushion       | 59.03 |  76.3 |
|         base        | 25.44 | 40.13 |
|         box         | 27.13 | 32.07 |
|        column       | 50.19 | 63.78 |
|      signboard      |  33.4 | 48.37 |
|   chest of drawers  | 38.12 | 66.42 |
|       counter       | 33.62 | 45.94 |
|         sand        | 54.97 | 85.04 |
|         sink        | 74.52 | 80.87 |
|      skyscraper     | 46.03 | 70.08 |
|      fireplace      | 73.08 |  87.4 |
|     refrigerator    | 74.05 | 87.74 |
|      grandstand     |  8.33 |  8.98 |
|         path        | 12.85 | 20.32 |
|        stairs       | 45.08 | 65.11 |
|        runway       | 76.81 | 89.28 |
|         case        | 37.18 | 48.58 |
|      pool table     | 92.37 | 96.61 |
|        pillow       | 55.96 | 63.78 |
|     screen door     | 75.46 | 78.19 |
|       stairway      | 43.79 | 71.51 |
|        river        | 18.47 |  53.0 |
|        bridge       | 73.54 | 83.45 |
|       bookcase      | 28.68 | 49.96 |
|        blind        | 24.63 | 28.97 |
|     coffee table    | 59.41 | 85.04 |
|        toilet       | 86.52 | 91.36 |
|        flower       | 34.31 | 54.98 |
|         book        | 41.76 | 70.83 |
|         hill        |  7.77 |  9.88 |
|        bench        | 51.28 |  59.8 |
|      countertop     | 57.96 |  71.5 |
|        stove        | 71.89 | 85.02 |
|         palm        | 49.06 | 79.63 |
|    kitchen island   | 47.68 | 84.91 |
|       computer      | 67.68 | 76.44 |
|     swivel chair    | 33.55 | 46.77 |
|         boat        | 63.53 | 78.94 |
|         bar         | 34.52 | 49.75 |
|    arcade machine   | 41.42 |  43.5 |
|        hovel        | 18.22 | 22.55 |
|         bus         | 90.15 | 94.93 |
|        towel        | 73.49 | 83.15 |
|        light        | 37.88 | 44.41 |
|        truck        | 37.43 | 47.52 |
|        tower        | 10.26 | 17.93 |
|      chandelier     | 61.96 |  72.5 |
|        awning       | 28.49 | 38.33 |
|     streetlight     | 26.63 | 37.57 |
|        booth        | 20.15 | 23.04 |
| television receiver | 74.61 | 84.56 |
|       airplane      | 58.85 | 65.07 |
|      dirt track     | 12.42 | 30.47 |
|       apparel       | 46.15 | 64.82 |
|         pole        | 19.82 | 26.33 |
|         land        |  0.0  |  0.0  |
|      bannister      |  7.49 | 10.53 |
|      escalator      | 62.82 | 82.55 |
|       ottoman       | 46.66 | 65.06 |
|        bottle       |  22.1 | 30.55 |
|        buffet       | 41.21 | 55.65 |
|        poster       | 30.32 | 34.98 |
|        stage        |  8.96 | 17.82 |
|         van         | 11.13 | 13.58 |
|         ship        |  0.0  |  0.0  |
|       fountain      | 18.37 | 18.91 |
|    conveyer belt    | 79.25 | 93.23 |
|        canopy       |  39.5 | 50.69 |
|        washer       | 79.69 | 81.77 |
|      plaything      | 30.64 | 37.64 |
|    swimming pool    | 50.82 |  50.9 |
|        stool        | 35.16 | 42.88 |
|        barrel       | 28.56 | 29.86 |
|        basket       | 36.39 | 51.72 |
|      waterfall      | 45.92 | 76.42 |
|         tent        |  0.0  |  0.0  |
|         bag         | 25.25 | 29.72 |
|       minibike      |  71.0 | 85.94 |
|        cradle       | 77.78 | 95.09 |
|         oven        | 41.53 | 51.98 |
|         ball        | 37.82 |  69.8 |
|         food        | 20.34 | 20.88 |
|         step        |  13.5 | 17.46 |
|         tank        | 28.12 | 31.19 |
|      trade name     |  31.1 |  44.2 |
|      microwave      | 79.88 | 89.54 |
|         pot         | 52.42 | 61.94 |
|        animal       | 61.25 | 62.63 |
|       bicycle       | 59.61 |  79.0 |
|         lake        |  0.0  |  0.0  |
|      dishwasher     | 65.31 | 79.98 |
|        screen       | 48.16 | 56.99 |
|       blanket       | 17.35 | 20.43 |
|      sculpture      | 41.84 |  48.6 |
|         hood        | 54.88 | 64.83 |
|        sconce       |  33.6 | 41.13 |
|         vase        | 35.95 | 58.29 |
|    traffic light    | 28.92 | 48.16 |
|         tray        |  9.9  | 31.21 |
|        ashcan       |  47.1 |  61.3 |
|         fan         | 52.02 | 61.05 |
|         pier        | 32.89 | 35.35 |
|      crt screen     |  12.4 |  35.8 |
|        plate        | 53.31 | 77.19 |
|       monitor       |  1.83 |  2.24 |
|    bulletin board   | 45.56 | 68.04 |
|        shower       |  0.02 |  0.05 |
|       radiator      | 64.88 | 69.58 |
|        glass        | 19.83 | 22.03 |
|        clock        | 36.74 | 44.05 |
|         flag        | 66.59 | 73.71 |
+---------------------+-------+-------+
2023-11-10 00:10:33,063 - mmseg - INFO - Summary:
2023-11-10 00:10:33,064 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 82.22 | 46.35 | 58.32 |
+-------+-------+-------+
2023-11-10 00:10:33,064 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-10 00:10:33,065 - mmseg - INFO - Iter(val) [250]	aAcc: 0.8222, mIoU: 0.4635, mAcc: 0.5832, IoU.wall: 0.7745, IoU.building: 0.8127, IoU.sky: 0.9336, IoU.floor: 0.8175, IoU.tree: 0.7359, IoU.ceiling: 0.8350, IoU.road: 0.8313, IoU.bed : 0.8997, IoU.windowpane: 0.6245, IoU.grass: 0.6133, IoU.cabinet: 0.6196, IoU.sidewalk: 0.6349, IoU.person: 0.8001, IoU.earth: 0.3502, IoU.door: 0.5238, IoU.table: 0.6261, IoU.mountain: 0.5382, IoU.plant: 0.4825, IoU.curtain: 0.7250, IoU.chair: 0.5531, IoU.car: 0.8112, IoU.water: 0.4977, IoU.painting: 0.7511, IoU.sofa: 0.6856, IoU.shelf: 0.3324, IoU.house: 0.2496, IoU.sea: 0.5358, IoU.mirror: 0.7015, IoU.rug: 0.6182, IoU.field: 0.3116, IoU.armchair: 0.4573, IoU.seat: 0.4883, IoU.fence: 0.3064, IoU.desk: 0.4560, IoU.rock: 0.5236, IoU.wardrobe: 0.3631, IoU.lamp: 0.6226, IoU.bathtub: 0.8002, IoU.railing: 0.3746, IoU.cushion: 0.5903, IoU.base: 0.2544, IoU.box: 0.2713, IoU.column: 0.5019, IoU.signboard: 0.3340, IoU.chest of drawers: 0.3812, IoU.counter: 0.3362, IoU.sand: 0.5497, IoU.sink: 0.7452, IoU.skyscraper: 0.4603, IoU.fireplace: 0.7308, IoU.refrigerator: 0.7405, IoU.grandstand: 0.0833, IoU.path: 0.1285, IoU.stairs: 0.4508, IoU.runway: 0.7681, IoU.case: 0.3718, IoU.pool table: 0.9237, IoU.pillow: 0.5596, IoU.screen door: 0.7546, IoU.stairway: 0.4379, IoU.river: 0.1847, IoU.bridge: 0.7354, IoU.bookcase: 0.2868, IoU.blind: 0.2463, IoU.coffee table: 0.5941, IoU.toilet: 0.8652, IoU.flower: 0.3431, IoU.book: 0.4176, IoU.hill: 0.0777, IoU.bench: 0.5128, IoU.countertop: 0.5796, IoU.stove: 0.7189, IoU.palm: 0.4906, IoU.kitchen island: 0.4768, IoU.computer: 0.6768, IoU.swivel chair: 0.3355, IoU.boat: 0.6353, IoU.bar: 0.3452, IoU.arcade machine: 0.4142, IoU.hovel: 0.1822, IoU.bus: 0.9015, IoU.towel: 0.7349, IoU.light: 0.3788, IoU.truck: 0.3743, IoU.tower: 0.1026, IoU.chandelier: 0.6196, IoU.awning: 0.2849, IoU.streetlight: 0.2663, IoU.booth: 0.2015, IoU.television receiver: 0.7461, IoU.airplane: 0.5885, IoU.dirt track: 0.1242, IoU.apparel: 0.4615, IoU.pole: 0.1982, IoU.land: 0.0000, IoU.bannister: 0.0749, IoU.escalator: 0.6282, IoU.ottoman: 0.4666, IoU.bottle: 0.2210, IoU.buffet: 0.4121, IoU.poster: 0.3032, IoU.stage: 0.0896, IoU.van: 0.1113, IoU.ship: 0.0000, IoU.fountain: 0.1837, IoU.conveyer belt: 0.7925, IoU.canopy: 0.3950, IoU.washer: 0.7969, IoU.plaything: 0.3064, IoU.swimming pool: 0.5082, IoU.stool: 0.3516, IoU.barrel: 0.2856, IoU.basket: 0.3639, IoU.waterfall: 0.4592, IoU.tent: 0.0000, IoU.bag: 0.2525, IoU.minibike: 0.7100, IoU.cradle: 0.7778, IoU.oven: 0.4153, IoU.ball: 0.3782, IoU.food: 0.2034, IoU.step: 0.1350, IoU.tank: 0.2812, IoU.trade name: 0.3110, IoU.microwave: 0.7988, IoU.pot: 0.5242, IoU.animal: 0.6125, IoU.bicycle: 0.5961, IoU.lake: 0.0000, IoU.dishwasher: 0.6531, IoU.screen: 0.4816, IoU.blanket: 0.1735, IoU.sculpture: 0.4184, IoU.hood: 0.5488, IoU.sconce: 0.3360, IoU.vase: 0.3595, IoU.traffic light: 0.2892, IoU.tray: 0.0990, IoU.ashcan: 0.4710, IoU.fan: 0.5202, IoU.pier: 0.3289, IoU.crt screen: 0.1240, IoU.plate: 0.5331, IoU.monitor: 0.0183, IoU.bulletin board: 0.4556, IoU.shower: 0.0002, IoU.radiator: 0.6488, IoU.glass: 0.1983, IoU.clock: 0.3674, IoU.flag: 0.6659, Acc.wall: 0.8871, Acc.building: 0.9093, Acc.sky: 0.9740, Acc.floor: 0.9113, Acc.tree: 0.8666, Acc.ceiling: 0.9301, Acc.road: 0.8993, Acc.bed : 0.9530, Acc.windowpane: 0.7857, Acc.grass: 0.7726, Acc.cabinet: 0.7423, Acc.sidewalk: 0.8113, Acc.person: 0.9367, Acc.earth: 0.4642, Acc.door: 0.6318, Acc.table: 0.7696, Acc.mountain: 0.6510, Acc.plant: 0.5776, Acc.curtain: 0.8860, Acc.chair: 0.7115, Acc.car: 0.9359, Acc.water: 0.6516, Acc.painting: 0.8638, Acc.sofa: 0.9027, Acc.shelf: 0.5201, Acc.house: 0.3192, Acc.sea: 0.7111, Acc.mirror: 0.7616, Acc.rug: 0.6737, Acc.field: 0.6651, Acc.armchair: 0.5933, Acc.seat: 0.6932, Acc.fence: 0.4300, Acc.desk: 0.6051, Acc.rock: 0.7655, Acc.wardrobe: 0.5105, Acc.lamp: 0.7605, Acc.bathtub: 0.8435, Acc.railing: 0.5275, Acc.cushion: 0.7630, Acc.base: 0.4013, Acc.box: 0.3207, Acc.column: 0.6378, Acc.signboard: 0.4837, Acc.chest of drawers: 0.6642, Acc.counter: 0.4594, Acc.sand: 0.8504, Acc.sink: 0.8087, Acc.skyscraper: 0.7008, Acc.fireplace: 0.8740, Acc.refrigerator: 0.8774, Acc.grandstand: 0.0898, Acc.path: 0.2032, Acc.stairs: 0.6511, Acc.runway: 0.8928, Acc.case: 0.4858, Acc.pool table: 0.9661, Acc.pillow: 0.6378, Acc.screen door: 0.7819, Acc.stairway: 0.7151, Acc.river: 0.5300, Acc.bridge: 0.8345, Acc.bookcase: 0.4996, Acc.blind: 0.2897, Acc.coffee table: 0.8504, Acc.toilet: 0.9136, Acc.flower: 0.5498, Acc.book: 0.7083, Acc.hill: 0.0988, Acc.bench: 0.5980, Acc.countertop: 0.7150, Acc.stove: 0.8502, Acc.palm: 0.7963, Acc.kitchen island: 0.8491, Acc.computer: 0.7644, Acc.swivel chair: 0.4677, Acc.boat: 0.7894, Acc.bar: 0.4975, Acc.arcade machine: 0.4350, Acc.hovel: 0.2255, Acc.bus: 0.9493, Acc.towel: 0.8315, Acc.light: 0.4441, Acc.truck: 0.4752, Acc.tower: 0.1793, Acc.chandelier: 0.7250, Acc.awning: 0.3833, Acc.streetlight: 0.3757, Acc.booth: 0.2304, Acc.television receiver: 0.8456, Acc.airplane: 0.6507, Acc.dirt track: 0.3047, Acc.apparel: 0.6482, Acc.pole: 0.2633, Acc.land: 0.0000, Acc.bannister: 0.1053, Acc.escalator: 0.8255, Acc.ottoman: 0.6506, Acc.bottle: 0.3055, Acc.buffet: 0.5565, Acc.poster: 0.3498, Acc.stage: 0.1782, Acc.van: 0.1358, Acc.ship: 0.0000, Acc.fountain: 0.1891, Acc.conveyer belt: 0.9323, Acc.canopy: 0.5069, Acc.washer: 0.8177, Acc.plaything: 0.3764, Acc.swimming pool: 0.5090, Acc.stool: 0.4288, Acc.barrel: 0.2986, Acc.basket: 0.5172, Acc.waterfall: 0.7642, Acc.tent: 0.0000, Acc.bag: 0.2972, Acc.minibike: 0.8594, Acc.cradle: 0.9509, Acc.oven: 0.5198, Acc.ball: 0.6980, Acc.food: 0.2088, Acc.step: 0.1746, Acc.tank: 0.3119, Acc.trade name: 0.4420, Acc.microwave: 0.8954, Acc.pot: 0.6194, Acc.animal: 0.6263, Acc.bicycle: 0.7900, Acc.lake: 0.0000, Acc.dishwasher: 0.7998, Acc.screen: 0.5699, Acc.blanket: 0.2043, Acc.sculpture: 0.4860, Acc.hood: 0.6483, Acc.sconce: 0.4113, Acc.vase: 0.5829, Acc.traffic light: 0.4816, Acc.tray: 0.3121, Acc.ashcan: 0.6130, Acc.fan: 0.6105, Acc.pier: 0.3535, Acc.crt screen: 0.3580, Acc.plate: 0.7719, Acc.monitor: 0.0224, Acc.bulletin board: 0.6804, Acc.shower: 0.0005, Acc.radiator: 0.6958, Acc.glass: 0.2203, Acc.clock: 0.4405, Acc.flag: 0.7371
2023-11-10 00:11:36,582 - mmseg - INFO - Iter [4050/5000]	lr: 6.162e-07, eta: 0:21:54, time: 2.370, data_time: 1.152, memory: 38534, decode.loss_ce: 0.1721, decode.acc_seg: 93.1551, loss: 0.1721
2023-11-10 00:12:37,742 - mmseg - INFO - Iter [4100/5000]	lr: 5.838e-07, eta: 0:20:43, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1806, decode.acc_seg: 92.6499, loss: 0.1806
2023-11-10 00:13:41,186 - mmseg - INFO - Iter [4150/5000]	lr: 5.514e-07, eta: 0:19:32, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1813, decode.acc_seg: 92.9452, loss: 0.1813
2023-11-10 00:14:44,750 - mmseg - INFO - Iter [4200/5000]	lr: 5.190e-07, eta: 0:18:22, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1823, decode.acc_seg: 92.7957, loss: 0.1823
2023-11-10 00:15:45,943 - mmseg - INFO - Iter [4250/5000]	lr: 4.866e-07, eta: 0:17:12, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1821, decode.acc_seg: 92.7583, loss: 0.1821
2023-11-10 00:16:49,893 - mmseg - INFO - Iter [4300/5000]	lr: 4.542e-07, eta: 0:16:02, time: 1.279, data_time: 0.062, memory: 38534, decode.loss_ce: 0.1938, decode.acc_seg: 92.6474, loss: 0.1938
2023-11-10 00:17:53,345 - mmseg - INFO - Iter [4350/5000]	lr: 4.218e-07, eta: 0:14:53, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1887, decode.acc_seg: 92.4755, loss: 0.1887
2023-11-10 00:18:54,490 - mmseg - INFO - Iter [4400/5000]	lr: 3.894e-07, eta: 0:13:43, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1645, decode.acc_seg: 93.3589, loss: 0.1645
2023-11-10 00:19:58,033 - mmseg - INFO - Iter [4450/5000]	lr: 3.570e-07, eta: 0:12:34, time: 1.271, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1890, decode.acc_seg: 92.7035, loss: 0.1890
2023-11-10 00:20:59,229 - mmseg - INFO - Iter [4500/5000]	lr: 3.246e-07, eta: 0:11:24, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1712, decode.acc_seg: 93.1262, loss: 0.1712
2023-11-10 00:22:02,754 - mmseg - INFO - Iter [4550/5000]	lr: 2.922e-07, eta: 0:10:15, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1757, decode.acc_seg: 92.8165, loss: 0.1757
2023-11-10 00:23:06,228 - mmseg - INFO - Iter [4600/5000]	lr: 2.598e-07, eta: 0:09:07, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1791, decode.acc_seg: 92.7841, loss: 0.1791
2023-11-10 00:24:07,419 - mmseg - INFO - Iter [4650/5000]	lr: 2.274e-07, eta: 0:07:58, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1697, decode.acc_seg: 93.1094, loss: 0.1697
2023-11-10 00:25:11,025 - mmseg - INFO - Iter [4700/5000]	lr: 1.950e-07, eta: 0:06:49, time: 1.272, data_time: 0.054, memory: 38534, decode.loss_ce: 0.1672, decode.acc_seg: 93.0462, loss: 0.1672
2023-11-10 00:26:14,564 - mmseg - INFO - Iter [4750/5000]	lr: 1.626e-07, eta: 0:05:41, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1750, decode.acc_seg: 92.9852, loss: 0.1750
2023-11-10 00:27:15,778 - mmseg - INFO - Iter [4800/5000]	lr: 1.302e-07, eta: 0:04:32, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1718, decode.acc_seg: 93.0326, loss: 0.1718
2023-11-10 00:28:19,277 - mmseg - INFO - Iter [4850/5000]	lr: 9.784e-08, eta: 0:03:24, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1686, decode.acc_seg: 93.2814, loss: 0.1686
2023-11-10 00:29:22,839 - mmseg - INFO - Iter [4900/5000]	lr: 6.544e-08, eta: 0:02:16, time: 1.271, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1676, decode.acc_seg: 93.1667, loss: 0.1676
2023-11-10 00:30:24,077 - mmseg - INFO - Iter [4950/5000]	lr: 3.305e-08, eta: 0:01:07, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1681, decode.acc_seg: 93.2769, loss: 0.1681
2023-11-10 00:31:27,556 - mmseg - INFO - Saving checkpoint at 5000 iterations
2023-11-10 00:32:20,637 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-10 00:32:20,637 - mmseg - INFO - Iter [5000/5000]	lr: 6.480e-10, eta: 0:00:00, time: 2.331, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1734, decode.acc_seg: 93.1065, loss: 0.1734
2023-11-10 00:33:14,421 - mmseg - INFO - per class results:
2023-11-10 00:33:14,426 - mmseg - INFO - 
+---------------------+-------+-------+
|        Class        |  IoU  |  Acc  |
+---------------------+-------+-------+
|         wall        | 77.58 | 88.71 |
|       building      | 81.51 | 91.58 |
|         sky         | 93.45 | 97.31 |
|        floor        | 81.67 | 91.14 |
|         tree        | 73.94 | 88.05 |
|       ceiling       | 83.82 | 92.55 |
|         road        | 83.13 | 89.93 |
|         bed         |  90.1 | 95.52 |
|      windowpane     |  62.5 | 79.58 |
|        grass        | 62.57 | 77.97 |
|       cabinet       | 62.14 | 73.88 |
|       sidewalk      | 63.66 | 81.68 |
|        person       | 80.06 |  93.8 |
|        earth        | 35.86 | 47.42 |
|         door        | 52.69 | 64.44 |
|        table        | 62.49 |  77.9 |
|       mountain      | 52.71 | 62.56 |
|        plant        | 50.51 | 62.13 |
|       curtain       | 73.25 | 87.91 |
|        chair        | 54.96 | 69.28 |
|         car         | 80.55 | 93.84 |
|        water        | 52.22 | 68.48 |
|       painting      | 74.26 | 87.39 |
|         sofa        | 69.98 |  90.1 |
|        shelf        | 33.38 | 51.75 |
|        house        | 24.96 | 32.11 |
|         sea         | 54.03 | 70.35 |
|        mirror       | 70.87 | 76.51 |
|         rug         | 61.76 | 67.89 |
|        field        |  32.0 |  65.0 |
|       armchair      | 48.03 | 64.36 |
|         seat        | 47.88 | 66.74 |
|        fence        | 30.25 | 39.46 |
|         desk        | 44.88 | 64.12 |
|         rock        | 50.89 | 77.01 |
|       wardrobe      | 37.35 |  50.7 |
|         lamp        | 62.48 | 76.67 |
|       bathtub       | 80.59 | 87.13 |
|       railing       | 37.52 | 51.19 |
|       cushion       | 59.99 | 74.96 |
|         base        | 25.27 | 38.19 |
|         box         |  27.8 | 33.58 |
|        column       |  49.8 | 63.55 |
|      signboard      | 32.74 | 48.58 |
|   chest of drawers  | 36.41 | 65.59 |
|       counter       | 29.87 | 39.58 |
|         sand        | 55.32 | 84.79 |
|         sink        | 74.41 | 81.92 |
|      skyscraper     | 48.04 | 70.07 |
|      fireplace      | 71.86 | 87.64 |
|     refrigerator    | 74.29 | 87.56 |
|      grandstand     |  9.32 | 10.32 |
|         path        | 11.53 | 16.09 |
|        stairs       | 43.39 | 57.98 |
|        runway       | 76.91 | 88.81 |
|         case        | 40.93 | 54.37 |
|      pool table     | 93.16 | 96.45 |
|        pillow       | 58.58 | 68.22 |
|     screen door     | 72.04 | 75.97 |
|       stairway      | 46.79 | 69.67 |
|        river        | 17.58 | 46.42 |
|        bridge       | 71.74 | 79.61 |
|       bookcase      | 27.98 | 46.28 |
|        blind        | 25.81 | 30.14 |
|     coffee table    | 60.82 | 82.61 |
|        toilet       | 86.49 | 91.28 |
|        flower       | 35.78 | 55.66 |
|         book        | 42.22 | 72.27 |
|         hill        |  7.58 |  9.07 |
|        bench        | 51.48 | 59.78 |
|      countertop     | 57.01 | 72.62 |
|        stove        | 71.34 | 85.91 |
|         palm        | 49.01 | 77.18 |
|    kitchen island   | 46.25 | 80.04 |
|       computer      | 69.41 | 80.16 |
|     swivel chair    |  35.1 | 50.79 |
|         boat        | 64.18 | 81.13 |
|         bar         | 33.51 | 49.98 |
|    arcade machine   | 42.62 | 44.91 |
|        hovel        |  17.8 | 20.97 |
|         bus         | 90.84 | 94.51 |
|        towel        | 73.62 |  83.5 |
|        light        | 41.07 |  49.4 |
|        truck        | 37.27 | 48.07 |
|        tower        |  9.5  | 16.49 |
|      chandelier     | 63.17 | 76.36 |
|        awning       | 27.54 | 38.26 |
|     streetlight     | 26.44 | 37.12 |
|        booth        |  19.7 | 22.87 |
| television receiver | 73.97 | 85.56 |
|       airplane      | 59.07 | 65.28 |
|      dirt track     | 11.78 | 30.86 |
|       apparel       | 46.52 | 67.53 |
|         pole        | 19.21 | 25.06 |
|         land        |  0.0  |  0.0  |
|      bannister      |  7.03 |  9.39 |
|      escalator      | 58.31 | 75.37 |
|       ottoman       | 46.85 |  63.3 |
|        bottle       | 23.31 | 30.93 |
|        buffet       | 43.45 | 59.15 |
|        poster       |  29.3 | 33.47 |
|        stage        |  8.03 | 15.24 |
|         van         |  11.9 | 14.45 |
|         ship        |  0.0  |  0.0  |
|       fountain      | 12.62 | 12.78 |
|    conveyer belt    | 79.09 |  93.3 |
|        canopy       | 39.15 | 50.69 |
|        washer       | 78.19 | 79.53 |
|      plaything      | 30.59 | 38.71 |
|    swimming pool    | 51.83 | 52.22 |
|        stool        | 35.19 | 43.41 |
|        barrel       | 29.82 |  30.6 |
|        basket       | 36.95 | 51.28 |
|      waterfall      | 44.16 | 71.02 |
|         tent        |  0.0  |  0.0  |
|         bag         | 25.81 | 32.01 |
|       minibike      | 71.97 | 84.71 |
|        cradle       | 76.89 | 97.34 |
|         oven        |  42.5 | 52.89 |
|         ball        | 37.89 | 69.97 |
|         food        | 25.48 | 26.32 |
|         step        | 10.14 | 12.88 |
|         tank        | 28.89 | 31.18 |
|      trade name     | 30.26 | 40.99 |
|      microwave      | 80.38 | 89.89 |
|         pot         | 52.96 | 61.42 |
|        animal       | 62.68 |  64.2 |
|       bicycle       | 58.51 | 75.89 |
|         lake        |  0.0  |  0.0  |
|      dishwasher     | 65.73 | 79.72 |
|        screen       | 52.18 | 64.02 |
|       blanket       | 18.31 | 21.73 |
|      sculpture      | 43.49 | 50.44 |
|         hood        | 56.08 | 65.21 |
|        sconce       | 36.73 | 48.86 |
|         vase        | 37.67 | 55.37 |
|    traffic light    | 29.61 | 46.04 |
|         tray        | 12.27 | 29.22 |
|        ashcan       | 46.92 | 61.19 |
|         fan         |  53.0 | 61.59 |
|         pier        | 33.29 | 36.08 |
|      crt screen     | 13.62 | 30.74 |
|        plate        | 53.06 | 77.65 |
|       monitor       |  1.78 |  2.12 |
|    bulletin board   | 44.76 | 68.17 |
|        shower       |  0.13 |  0.32 |
|       radiator      | 64.74 | 68.82 |
|        glass        | 19.89 | 21.89 |
|        clock        | 38.04 | 44.16 |
|         flag        | 65.61 | 71.61 |
+---------------------+-------+-------+
2023-11-10 00:33:14,426 - mmseg - INFO - Summary:
2023-11-10 00:33:14,427 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 82.44 | 46.54 | 58.23 |
+-------+-------+-------+
2023-11-10 00:33:14,427 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
2023-11-10 00:33:14,428 - mmseg - INFO - Iter(val) [250]	aAcc: 0.8244, mIoU: 0.4654, mAcc: 0.5823, IoU.wall: 0.7758, IoU.building: 0.8151, IoU.sky: 0.9345, IoU.floor: 0.8167, IoU.tree: 0.7394, IoU.ceiling: 0.8382, IoU.road: 0.8313, IoU.bed : 0.9010, IoU.windowpane: 0.6250, IoU.grass: 0.6257, IoU.cabinet: 0.6214, IoU.sidewalk: 0.6366, IoU.person: 0.8006, IoU.earth: 0.3586, IoU.door: 0.5269, IoU.table: 0.6249, IoU.mountain: 0.5271, IoU.plant: 0.5051, IoU.curtain: 0.7325, IoU.chair: 0.5496, IoU.car: 0.8055, IoU.water: 0.5222, IoU.painting: 0.7426, IoU.sofa: 0.6998, IoU.shelf: 0.3338, IoU.house: 0.2496, IoU.sea: 0.5403, IoU.mirror: 0.7087, IoU.rug: 0.6176, IoU.field: 0.3200, IoU.armchair: 0.4803, IoU.seat: 0.4788, IoU.fence: 0.3025, IoU.desk: 0.4488, IoU.rock: 0.5089, IoU.wardrobe: 0.3735, IoU.lamp: 0.6248, IoU.bathtub: 0.8059, IoU.railing: 0.3752, IoU.cushion: 0.5999, IoU.base: 0.2527, IoU.box: 0.2780, IoU.column: 0.4980, IoU.signboard: 0.3274, IoU.chest of drawers: 0.3641, IoU.counter: 0.2987, IoU.sand: 0.5532, IoU.sink: 0.7441, IoU.skyscraper: 0.4804, IoU.fireplace: 0.7186, IoU.refrigerator: 0.7429, IoU.grandstand: 0.0932, IoU.path: 0.1153, IoU.stairs: 0.4339, IoU.runway: 0.7691, IoU.case: 0.4093, IoU.pool table: 0.9316, IoU.pillow: 0.5858, IoU.screen door: 0.7204, IoU.stairway: 0.4679, IoU.river: 0.1758, IoU.bridge: 0.7174, IoU.bookcase: 0.2798, IoU.blind: 0.2581, IoU.coffee table: 0.6082, IoU.toilet: 0.8649, IoU.flower: 0.3578, IoU.book: 0.4222, IoU.hill: 0.0758, IoU.bench: 0.5148, IoU.countertop: 0.5701, IoU.stove: 0.7134, IoU.palm: 0.4901, IoU.kitchen island: 0.4625, IoU.computer: 0.6941, IoU.swivel chair: 0.3510, IoU.boat: 0.6418, IoU.bar: 0.3351, IoU.arcade machine: 0.4262, IoU.hovel: 0.1780, IoU.bus: 0.9084, IoU.towel: 0.7362, IoU.light: 0.4107, IoU.truck: 0.3727, IoU.tower: 0.0950, IoU.chandelier: 0.6317, IoU.awning: 0.2754, IoU.streetlight: 0.2644, IoU.booth: 0.1970, IoU.television receiver: 0.7397, IoU.airplane: 0.5907, IoU.dirt track: 0.1178, IoU.apparel: 0.4652, IoU.pole: 0.1921, IoU.land: 0.0000, IoU.bannister: 0.0703, IoU.escalator: 0.5831, IoU.ottoman: 0.4685, IoU.bottle: 0.2331, IoU.buffet: 0.4345, IoU.poster: 0.2930, IoU.stage: 0.0803, IoU.van: 0.1190, IoU.ship: 0.0000, IoU.fountain: 0.1262, IoU.conveyer belt: 0.7909, IoU.canopy: 0.3915, IoU.washer: 0.7819, IoU.plaything: 0.3059, IoU.swimming pool: 0.5183, IoU.stool: 0.3519, IoU.barrel: 0.2982, IoU.basket: 0.3695, IoU.waterfall: 0.4416, IoU.tent: 0.0000, IoU.bag: 0.2581, IoU.minibike: 0.7197, IoU.cradle: 0.7689, IoU.oven: 0.4250, IoU.ball: 0.3789, IoU.food: 0.2548, IoU.step: 0.1014, IoU.tank: 0.2889, IoU.trade name: 0.3026, IoU.microwave: 0.8038, IoU.pot: 0.5296, IoU.animal: 0.6268, IoU.bicycle: 0.5851, IoU.lake: 0.0000, IoU.dishwasher: 0.6573, IoU.screen: 0.5218, IoU.blanket: 0.1831, IoU.sculpture: 0.4349, IoU.hood: 0.5608, IoU.sconce: 0.3673, IoU.vase: 0.3767, IoU.traffic light: 0.2961, IoU.tray: 0.1227, IoU.ashcan: 0.4692, IoU.fan: 0.5300, IoU.pier: 0.3329, IoU.crt screen: 0.1362, IoU.plate: 0.5306, IoU.monitor: 0.0178, IoU.bulletin board: 0.4476, IoU.shower: 0.0013, IoU.radiator: 0.6474, IoU.glass: 0.1989, IoU.clock: 0.3804, IoU.flag: 0.6561, Acc.wall: 0.8871, Acc.building: 0.9158, Acc.sky: 0.9731, Acc.floor: 0.9114, Acc.tree: 0.8805, Acc.ceiling: 0.9255, Acc.road: 0.8993, Acc.bed : 0.9552, Acc.windowpane: 0.7958, Acc.grass: 0.7797, Acc.cabinet: 0.7388, Acc.sidewalk: 0.8168, Acc.person: 0.9380, Acc.earth: 0.4742, Acc.door: 0.6444, Acc.table: 0.7790, Acc.mountain: 0.6256, Acc.plant: 0.6213, Acc.curtain: 0.8791, Acc.chair: 0.6928, Acc.car: 0.9384, Acc.water: 0.6848, Acc.painting: 0.8739, Acc.sofa: 0.9010, Acc.shelf: 0.5175, Acc.house: 0.3211, Acc.sea: 0.7035, Acc.mirror: 0.7651, Acc.rug: 0.6789, Acc.field: 0.6500, Acc.armchair: 0.6436, Acc.seat: 0.6674, Acc.fence: 0.3946, Acc.desk: 0.6412, Acc.rock: 0.7701, Acc.wardrobe: 0.5070, Acc.lamp: 0.7667, Acc.bathtub: 0.8713, Acc.railing: 0.5119, Acc.cushion: 0.7496, Acc.base: 0.3819, Acc.box: 0.3358, Acc.column: 0.6355, Acc.signboard: 0.4858, Acc.chest of drawers: 0.6559, Acc.counter: 0.3958, Acc.sand: 0.8479, Acc.sink: 0.8192, Acc.skyscraper: 0.7007, Acc.fireplace: 0.8764, Acc.refrigerator: 0.8756, Acc.grandstand: 0.1032, Acc.path: 0.1609, Acc.stairs: 0.5798, Acc.runway: 0.8881, Acc.case: 0.5437, Acc.pool table: 0.9645, Acc.pillow: 0.6822, Acc.screen door: 0.7597, Acc.stairway: 0.6967, Acc.river: 0.4642, Acc.bridge: 0.7961, Acc.bookcase: 0.4628, Acc.blind: 0.3014, Acc.coffee table: 0.8261, Acc.toilet: 0.9128, Acc.flower: 0.5566, Acc.book: 0.7227, Acc.hill: 0.0907, Acc.bench: 0.5978, Acc.countertop: 0.7262, Acc.stove: 0.8591, Acc.palm: 0.7718, Acc.kitchen island: 0.8004, Acc.computer: 0.8016, Acc.swivel chair: 0.5079, Acc.boat: 0.8113, Acc.bar: 0.4998, Acc.arcade machine: 0.4491, Acc.hovel: 0.2097, Acc.bus: 0.9451, Acc.towel: 0.8350, Acc.light: 0.4940, Acc.truck: 0.4807, Acc.tower: 0.1649, Acc.chandelier: 0.7636, Acc.awning: 0.3826, Acc.streetlight: 0.3712, Acc.booth: 0.2287, Acc.television receiver: 0.8556, Acc.airplane: 0.6528, Acc.dirt track: 0.3086, Acc.apparel: 0.6753, Acc.pole: 0.2506, Acc.land: 0.0000, Acc.bannister: 0.0939, Acc.escalator: 0.7537, Acc.ottoman: 0.6330, Acc.bottle: 0.3093, Acc.buffet: 0.5915, Acc.poster: 0.3347, Acc.stage: 0.1524, Acc.van: 0.1445, Acc.ship: 0.0000, Acc.fountain: 0.1278, Acc.conveyer belt: 0.9330, Acc.canopy: 0.5069, Acc.washer: 0.7953, Acc.plaything: 0.3871, Acc.swimming pool: 0.5222, Acc.stool: 0.4341, Acc.barrel: 0.3060, Acc.basket: 0.5128, Acc.waterfall: 0.7102, Acc.tent: 0.0000, Acc.bag: 0.3201, Acc.minibike: 0.8471, Acc.cradle: 0.9734, Acc.oven: 0.5289, Acc.ball: 0.6997, Acc.food: 0.2632, Acc.step: 0.1288, Acc.tank: 0.3118, Acc.trade name: 0.4099, Acc.microwave: 0.8989, Acc.pot: 0.6142, Acc.animal: 0.6420, Acc.bicycle: 0.7589, Acc.lake: 0.0000, Acc.dishwasher: 0.7972, Acc.screen: 0.6402, Acc.blanket: 0.2173, Acc.sculpture: 0.5044, Acc.hood: 0.6521, Acc.sconce: 0.4886, Acc.vase: 0.5537, Acc.traffic light: 0.4604, Acc.tray: 0.2922, Acc.ashcan: 0.6119, Acc.fan: 0.6159, Acc.pier: 0.3608, Acc.crt screen: 0.3074, Acc.plate: 0.7765, Acc.monitor: 0.0212, Acc.bulletin board: 0.6817, Acc.shower: 0.0032, Acc.radiator: 0.6882, Acc.glass: 0.2189, Acc.clock: 0.4416, Acc.flag: 0.7161