2023-11-09 22:35:08,809 - mmseg - INFO - Multi-processing start method is `None` 2023-11-09 22:35:08,822 - mmseg - INFO - OpenCV num_threads is `128 2023-11-09 22:35:08,822 - mmseg - INFO - OMP num threads is 1 2023-11-09 22:35:08,907 - mmseg - INFO - Environment info: ------------------------------------------------------------ sys.platform: linux Python: 3.8.15 (default, Nov 4 2022, 20:59:55) [GCC 11.2.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB CUDA_HOME: /mnt/petrelfs/wangwenhai/miniconda3/envs/mmdetseg NVCC: Cuda compilation tools, release 11.7, V11.7.99 GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) PyTorch: 1.13.0 PyTorch compiling details: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX2 - CUDA Runtime 11.7 - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37 - CuDNN 8.5 - Magma 2.6.1 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, TorchVision: 0.14.0 OpenCV: 4.8.0 MMCV: 1.7.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.7 MMSegmentation: 0.27.0+ ------------------------------------------------------------ 2023-11-09 22:35:08,907 - mmseg - INFO - Distributed training: True 2023-11-09 22:35:09,165 - mmseg - INFO - Config: checkpoint = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segmenter/vit_base_p16_384_20220308-96dfe169.pth' backbone_norm_cfg = dict(type='LN', eps=1e-06, requires_grad=True) model = dict( type='EncoderDecoder', pretrained= './pretrained/intern_vit_6b_224px.pth', backbone=dict( type='InternViT6B', pretrain_size=224, img_size=504, patch_size=14, embed_dim=3200, depth=48, num_heads=25, mlp_ratio=4.0, qkv_bias=False, drop_path_rate=0.4, init_values=0.1, with_cp=True, use_flash_attn=True, qk_normalization=True, layerscale_no_force_fp32=True, freeze_vit=False, out_indices=[47]), decode_head=dict( type='FCNHead', in_channels=3200, channels=3200, num_convs=0, dropout_ratio=0.0, concat_input=False, num_classes=150, with_norm=True, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), test_cfg=dict(mode='slide', crop_size=(504, 504), stride=(322, 322))) dataset_type = 'ADE20KDataset' data_root = 'data/ade/ADEChallengeData2016' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) crop_size = (504, 504) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', reduce_zero_label=True), dict(type='Resize', img_scale=(2016, 504), ratio_range=(0.5, 2.0)), dict(type='RandomCrop', crop_size=(504, 504), cat_max_ratio=0.75), dict(type='RandomFlip', prob=0.5), dict(type='PhotoMetricDistortion'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size=(504, 504), pad_val=0, seg_pad_val=255), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_semantic_seg']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(2016, 504), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='ResizeToMultiple', size_divisor=14), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=4, train=dict( type='ADE20KDataset', data_root='data/ade/ADEChallengeData2016', img_dir='images/training', ann_dir='annotations/training', max_image_num=1263, pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', reduce_zero_label=True), dict(type='Resize', img_scale=(2016, 504), ratio_range=(0.5, 2.0)), dict(type='RandomCrop', crop_size=(504, 504), cat_max_ratio=0.75), dict(type='RandomFlip', prob=0.5), dict(type='PhotoMetricDistortion'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size=(504, 504), pad_val=0, seg_pad_val=255), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_semantic_seg']) ]), val=dict( type='ADE20KDataset', data_root='data/ade/ADEChallengeData2016', img_dir='images/validation', ann_dir='annotations/validation', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(2016, 504), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='ResizeToMultiple', size_divisor=14), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='ADE20KDataset', data_root='data/ade/ADEChallengeData2016', img_dir='images/validation', ann_dir='annotations/validation', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(2016, 504), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='ResizeToMultiple', size_divisor=14), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook', by_epoch=False), dict(type='TensorboardLoggerHook') ]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] cudnn_benchmark = True optimizer = dict( type='AdamW', lr=4e-05, betas=(0.9, 0.999), weight_decay=0.05, constructor='CustomLayerDecayOptimizerConstructor', paramwise_cfg=dict(num_layers=48, layer_decay_rate=0.95)) optimizer_config = dict() lr_config = dict( policy='poly', warmup='linear', warmup_iters=100, warmup_ratio=1e-06, power=1.0, min_lr=0.0, by_epoch=False) runner = dict(type='IterBasedRunner', max_iters=5000) checkpoint_config = dict( by_epoch=False, interval=1000, deepspeed=True, max_keep_ckpts=2) evaluation = dict( interval=1000, metric='mIoU', pre_eval=True, save_best='auto') deepspeed = True deepspeed_config = 'zero_configs/adam_zero1_bf16.json' pretrained = './pretrained/intern_vit_6b_224px.pth' custom_hooks = [dict(type='ToBFloat16Hook', priority=49)] work_dir = './work_dirs/segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16' gpu_ids = range(0, 8) auto_resume = False 2023-11-09 22:35:13,653 - mmseg - INFO - Set random seed to 15419458, deterministic: False 2023-11-09 22:36:35,693 - mmseg - INFO - 2023-11-09 22:37:00,605 - mmseg - INFO - initialize FCNHead with init_cfg {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}} Name of parameter - Initialization information backbone.pos_embed - torch.Size([1, 1297, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.cls_token - torch.Size([1, 1, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.patch_embed.proj.weight - torch.Size([3200, 3, 14, 14]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.patch_embed.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.0.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.1.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.2.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.3.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.4.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.5.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.6.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.7.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.8.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.9.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.10.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.11.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.12.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.13.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.14.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.15.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.16.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.17.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.18.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.19.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.20.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.21.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.22.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.23.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.24.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.25.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.26.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.27.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.28.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.29.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.30.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.31.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.32.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.33.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.34.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.35.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.36.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.37.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.38.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.39.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.40.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.41.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.42.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.43.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.44.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.45.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.46.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.norm1.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.attn.qkv.weight - torch.Size([9600, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.attn.proj.weight - torch.Size([3200, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.attn.proj.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.attn.q_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.attn.k_norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.ls1.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.norm2.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.mlp.fc1.weight - torch.Size([12800, 3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.mlp.fc1.bias - torch.Size([12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.mlp.fc2.weight - torch.Size([3200, 12800]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.mlp.fc2.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder backbone.blocks.47.ls2.gamma - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder decode_head.conv_seg.weight - torch.Size([150, 3200, 1, 1]): NormalInit: mean=0, std=0.01, bias=0 decode_head.conv_seg.bias - torch.Size([150]): NormalInit: mean=0, std=0.01, bias=0 decode_head.norm.weight - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder decode_head.norm.bias - torch.Size([3200]): The value is the same before and after calling `init_weights` of EncoderDecoder 2023-11-09 22:37:00,614 - mmseg - INFO - EncoderDecoder( (backbone): InternViT6B( (patch_embed): PatchEmbed( (proj): Conv2d(3, 3200, kernel_size=(14, 14), stride=(14, 14)) (norm): Identity() ) (pos_drop): Identity() (blocks): ModuleList( (0): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): Identity() (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): Identity() ) (1): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.009) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.009) ) (2): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.017) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.017) ) (3): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.026) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.026) ) (4): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.034) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.034) ) (5): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.043) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.043) ) (6): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.051) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.051) ) (7): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.060) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.060) ) (8): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.068) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.068) ) (9): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.077) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.077) ) (10): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.085) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.085) ) (11): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.094) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.094) ) (12): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.102) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.102) ) (13): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.111) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.111) ) (14): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.119) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.119) ) (15): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.128) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.128) ) (16): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.136) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.136) ) (17): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.145) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.145) ) (18): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.153) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.153) ) (19): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.162) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.162) ) (20): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.170) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.170) ) (21): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.179) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.179) ) (22): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.187) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.187) ) (23): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.196) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.196) ) (24): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.204) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.204) ) (25): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.213) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.213) ) (26): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.221) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.221) ) (27): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.230) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.230) ) (28): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.238) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.238) ) (29): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.247) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.247) ) (30): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.255) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.255) ) (31): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.264) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.264) ) (32): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.272) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.272) ) (33): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.281) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.281) ) (34): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.289) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.289) ) (35): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.298) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.298) ) (36): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.306) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.306) ) (37): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.315) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.315) ) (38): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.323) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.323) ) (39): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.332) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.332) ) (40): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.340) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.340) ) (41): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.349) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.349) ) (42): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.357) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.357) ) (43): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.366) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.366) ) (44): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.374) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.374) ) (45): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.383) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.383) ) (46): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.391) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.391) ) (47): Block( (norm1): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=3200, out_features=9600, bias=False) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=3200, out_features=3200, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (inner_attn): FlashAttention() (q_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (k_norm): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) ) (ls1): LayerScale() (drop_path1): DropPath(drop_prob=0.400) (norm2): FusedRMSNorm(torch.Size([3200]), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=3200, out_features=12800, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=12800, out_features=3200, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (ls2): LayerScale() (drop_path2): DropPath(drop_prob=0.400) ) ) ) (decode_head): FCNHead( input_transform=None, ignore_index=255, align_corners=False (loss_decode): CrossEntropyLoss(avg_non_ignore=False) (conv_seg): Conv2d(3200, 150, kernel_size=(1, 1), stride=(1, 1)) (convs): Identity() (norm): SyncBatchNorm(3200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}} ) 2023-11-09 22:37:01,153 - mmseg - INFO - Loaded 20210 images 2023-11-09 22:37:01,162 - mmseg - INFO - Randomly select 1263 images 2023-11-09 22:37:02,444 - mmseg - INFO - {'num_layers': 48, 'layer_decay_rate': 0.95} 2023-11-09 22:37:02,445 - mmseg - INFO - Build LayerDecayOptimizerConstructor 0.950000 - 50 2023-11-09 22:37:02,449 - mmseg - INFO - Param groups = { "layer_0_decay": { "param_names": [ "backbone.pos_embed", "backbone.cls_token", "backbone.patch_embed.proj.weight" ], "lr_scale": 0.0809947108175928, "lr": 3.2397884327037123e-06, "weight_decay": 0.05 }, "layer_0_no_decay": { "param_names": [ "backbone.patch_embed.proj.bias" ], "lr_scale": 0.0809947108175928, "lr": 3.2397884327037123e-06, "weight_decay": 0.0 }, "layer_1_no_decay": { "param_names": [ "backbone.blocks.0.norm1.weight", "backbone.blocks.0.attn.proj.bias", "backbone.blocks.0.attn.q_norm.weight", "backbone.blocks.0.attn.k_norm.weight", "backbone.blocks.0.ls1.gamma", "backbone.blocks.0.norm2.weight", "backbone.blocks.0.mlp.fc1.bias", "backbone.blocks.0.mlp.fc2.bias", "backbone.blocks.0.ls2.gamma" ], "lr_scale": 0.0852575903343082, "lr": 3.4103036133723282e-06, "weight_decay": 0.0 }, "layer_1_decay": { "param_names": [ "backbone.blocks.0.attn.qkv.weight", "backbone.blocks.0.attn.proj.weight", "backbone.blocks.0.mlp.fc1.weight", "backbone.blocks.0.mlp.fc2.weight" ], "lr_scale": 0.0852575903343082, "lr": 3.4103036133723282e-06, "weight_decay": 0.05 }, "layer_2_no_decay": { "param_names": [ "backbone.blocks.1.norm1.weight", "backbone.blocks.1.attn.proj.bias", "backbone.blocks.1.attn.q_norm.weight", "backbone.blocks.1.attn.k_norm.weight", "backbone.blocks.1.ls1.gamma", "backbone.blocks.1.norm2.weight", "backbone.blocks.1.mlp.fc1.bias", "backbone.blocks.1.mlp.fc2.bias", "backbone.blocks.1.ls2.gamma" ], "lr_scale": 0.08974483193085075, "lr": 3.5897932772340305e-06, "weight_decay": 0.0 }, "layer_2_decay": { "param_names": [ "backbone.blocks.1.attn.qkv.weight", "backbone.blocks.1.attn.proj.weight", "backbone.blocks.1.mlp.fc1.weight", "backbone.blocks.1.mlp.fc2.weight" ], "lr_scale": 0.08974483193085075, "lr": 3.5897932772340305e-06, "weight_decay": 0.05 }, "layer_3_no_decay": { "param_names": [ "backbone.blocks.2.norm1.weight", "backbone.blocks.2.attn.proj.bias", "backbone.blocks.2.attn.q_norm.weight", "backbone.blocks.2.attn.k_norm.weight", "backbone.blocks.2.ls1.gamma", "backbone.blocks.2.norm2.weight", "backbone.blocks.2.mlp.fc1.bias", "backbone.blocks.2.mlp.fc2.bias", "backbone.blocks.2.ls2.gamma" ], "lr_scale": 0.09446824413773763, "lr": 3.7787297655095058e-06, "weight_decay": 0.0 }, "layer_3_decay": { "param_names": [ "backbone.blocks.2.attn.qkv.weight", "backbone.blocks.2.attn.proj.weight", "backbone.blocks.2.mlp.fc1.weight", "backbone.blocks.2.mlp.fc2.weight" ], "lr_scale": 0.09446824413773763, "lr": 3.7787297655095058e-06, "weight_decay": 0.05 }, "layer_4_no_decay": { "param_names": [ "backbone.blocks.3.norm1.weight", "backbone.blocks.3.attn.proj.bias", "backbone.blocks.3.attn.q_norm.weight", "backbone.blocks.3.attn.k_norm.weight", "backbone.blocks.3.ls1.gamma", "backbone.blocks.3.norm2.weight", "backbone.blocks.3.mlp.fc1.bias", "backbone.blocks.3.mlp.fc2.bias", "backbone.blocks.3.ls2.gamma" ], "lr_scale": 0.09944025698709225, "lr": 3.97761027948369e-06, "weight_decay": 0.0 }, "layer_4_decay": { "param_names": [ "backbone.blocks.3.attn.qkv.weight", "backbone.blocks.3.attn.proj.weight", "backbone.blocks.3.mlp.fc1.weight", "backbone.blocks.3.mlp.fc2.weight" ], "lr_scale": 0.09944025698709225, "lr": 3.97761027948369e-06, "weight_decay": 0.05 }, "layer_5_no_decay": { "param_names": [ "backbone.blocks.4.norm1.weight", "backbone.blocks.4.attn.proj.bias", "backbone.blocks.4.attn.q_norm.weight", "backbone.blocks.4.attn.k_norm.weight", "backbone.blocks.4.ls1.gamma", "backbone.blocks.4.norm2.weight", "backbone.blocks.4.mlp.fc1.bias", "backbone.blocks.4.mlp.fc2.bias", "backbone.blocks.4.ls2.gamma" ], "lr_scale": 0.10467395472325501, "lr": 4.186958188930201e-06, "weight_decay": 0.0 }, "layer_5_decay": { "param_names": [ "backbone.blocks.4.attn.qkv.weight", "backbone.blocks.4.attn.proj.weight", "backbone.blocks.4.mlp.fc1.weight", "backbone.blocks.4.mlp.fc2.weight" ], "lr_scale": 0.10467395472325501, "lr": 4.186958188930201e-06, "weight_decay": 0.05 }, "layer_6_no_decay": { "param_names": [ "backbone.blocks.5.norm1.weight", "backbone.blocks.5.attn.proj.bias", "backbone.blocks.5.attn.q_norm.weight", "backbone.blocks.5.attn.k_norm.weight", "backbone.blocks.5.ls1.gamma", "backbone.blocks.5.norm2.weight", "backbone.blocks.5.mlp.fc1.bias", "backbone.blocks.5.mlp.fc2.bias", "backbone.blocks.5.ls2.gamma" ], "lr_scale": 0.11018311023500528, "lr": 4.407324409400211e-06, "weight_decay": 0.0 }, "layer_6_decay": { "param_names": [ "backbone.blocks.5.attn.qkv.weight", "backbone.blocks.5.attn.proj.weight", "backbone.blocks.5.mlp.fc1.weight", "backbone.blocks.5.mlp.fc2.weight" ], "lr_scale": 0.11018311023500528, "lr": 4.407324409400211e-06, "weight_decay": 0.05 }, "layer_7_no_decay": { "param_names": [ "backbone.blocks.6.norm1.weight", "backbone.blocks.6.attn.proj.bias", "backbone.blocks.6.attn.q_norm.weight", "backbone.blocks.6.attn.k_norm.weight", "backbone.blocks.6.ls1.gamma", "backbone.blocks.6.norm2.weight", "backbone.blocks.6.mlp.fc1.bias", "backbone.blocks.6.mlp.fc2.bias", "backbone.blocks.6.ls2.gamma" ], "lr_scale": 0.11598222130000556, "lr": 4.639288852000222e-06, "weight_decay": 0.0 }, "layer_7_decay": { "param_names": [ "backbone.blocks.6.attn.qkv.weight", "backbone.blocks.6.attn.proj.weight", "backbone.blocks.6.mlp.fc1.weight", "backbone.blocks.6.mlp.fc2.weight" ], "lr_scale": 0.11598222130000556, "lr": 4.639288852000222e-06, "weight_decay": 0.05 }, "layer_8_no_decay": { "param_names": [ "backbone.blocks.7.norm1.weight", "backbone.blocks.7.attn.proj.bias", "backbone.blocks.7.attn.q_norm.weight", "backbone.blocks.7.attn.k_norm.weight", "backbone.blocks.7.ls1.gamma", "backbone.blocks.7.norm2.weight", "backbone.blocks.7.mlp.fc1.bias", "backbone.blocks.7.mlp.fc2.bias", "backbone.blocks.7.ls2.gamma" ], "lr_scale": 0.12208654873684796, "lr": 4.883461949473919e-06, "weight_decay": 0.0 }, "layer_8_decay": { "param_names": [ "backbone.blocks.7.attn.qkv.weight", "backbone.blocks.7.attn.proj.weight", "backbone.blocks.7.mlp.fc1.weight", "backbone.blocks.7.mlp.fc2.weight" ], "lr_scale": 0.12208654873684796, "lr": 4.883461949473919e-06, "weight_decay": 0.05 }, "layer_9_no_decay": { "param_names": [ "backbone.blocks.8.norm1.weight", "backbone.blocks.8.attn.proj.bias", "backbone.blocks.8.attn.q_norm.weight", "backbone.blocks.8.attn.k_norm.weight", "backbone.blocks.8.ls1.gamma", "backbone.blocks.8.norm2.weight", "backbone.blocks.8.mlp.fc1.bias", "backbone.blocks.8.mlp.fc2.bias", "backbone.blocks.8.ls2.gamma" ], "lr_scale": 0.12851215656510312, "lr": 5.140486262604126e-06, "weight_decay": 0.0 }, "layer_9_decay": { "param_names": [ "backbone.blocks.8.attn.qkv.weight", "backbone.blocks.8.attn.proj.weight", "backbone.blocks.8.mlp.fc1.weight", "backbone.blocks.8.mlp.fc2.weight" ], "lr_scale": 0.12851215656510312, "lr": 5.140486262604126e-06, "weight_decay": 0.05 }, "layer_10_no_decay": { "param_names": [ "backbone.blocks.9.norm1.weight", "backbone.blocks.9.attn.proj.bias", "backbone.blocks.9.attn.q_norm.weight", "backbone.blocks.9.attn.k_norm.weight", "backbone.blocks.9.ls1.gamma", "backbone.blocks.9.norm2.weight", "backbone.blocks.9.mlp.fc1.bias", "backbone.blocks.9.mlp.fc2.bias", "backbone.blocks.9.ls2.gamma" ], "lr_scale": 0.13527595427905592, "lr": 5.411038171162237e-06, "weight_decay": 0.0 }, "layer_10_decay": { "param_names": [ "backbone.blocks.9.attn.qkv.weight", "backbone.blocks.9.attn.proj.weight", "backbone.blocks.9.mlp.fc1.weight", "backbone.blocks.9.mlp.fc2.weight" ], "lr_scale": 0.13527595427905592, "lr": 5.411038171162237e-06, "weight_decay": 0.05 }, "layer_11_no_decay": { "param_names": [ "backbone.blocks.10.norm1.weight", "backbone.blocks.10.attn.proj.bias", "backbone.blocks.10.attn.q_norm.weight", "backbone.blocks.10.attn.k_norm.weight", "backbone.blocks.10.ls1.gamma", "backbone.blocks.10.norm2.weight", "backbone.blocks.10.mlp.fc1.bias", "backbone.blocks.10.mlp.fc2.bias", "backbone.blocks.10.ls2.gamma" ], "lr_scale": 0.14239574134637467, "lr": 5.695829653854987e-06, "weight_decay": 0.0 }, "layer_11_decay": { "param_names": [ "backbone.blocks.10.attn.qkv.weight", "backbone.blocks.10.attn.proj.weight", "backbone.blocks.10.mlp.fc1.weight", "backbone.blocks.10.mlp.fc2.weight" ], "lr_scale": 0.14239574134637467, "lr": 5.695829653854987e-06, "weight_decay": 0.05 }, "layer_12_no_decay": { "param_names": [ "backbone.blocks.11.norm1.weight", "backbone.blocks.11.attn.proj.bias", "backbone.blocks.11.attn.q_norm.weight", "backbone.blocks.11.attn.k_norm.weight", "backbone.blocks.11.ls1.gamma", "backbone.blocks.11.norm2.weight", "backbone.blocks.11.mlp.fc1.bias", "backbone.blocks.11.mlp.fc2.bias", "backbone.blocks.11.ls2.gamma" ], "lr_scale": 0.14989025404881545, "lr": 5.995610161952619e-06, "weight_decay": 0.0 }, "layer_12_decay": { "param_names": [ "backbone.blocks.11.attn.qkv.weight", "backbone.blocks.11.attn.proj.weight", "backbone.blocks.11.mlp.fc1.weight", "backbone.blocks.11.mlp.fc2.weight" ], "lr_scale": 0.14989025404881545, "lr": 5.995610161952619e-06, "weight_decay": 0.05 }, "layer_13_no_decay": { "param_names": [ "backbone.blocks.12.norm1.weight", "backbone.blocks.12.attn.proj.bias", "backbone.blocks.12.attn.q_norm.weight", "backbone.blocks.12.attn.k_norm.weight", "backbone.blocks.12.ls1.gamma", "backbone.blocks.12.norm2.weight", "backbone.blocks.12.mlp.fc1.bias", "backbone.blocks.12.mlp.fc2.bias", "backbone.blocks.12.ls2.gamma" ], "lr_scale": 0.1577792147882268, "lr": 6.311168591529072e-06, "weight_decay": 0.0 }, "layer_13_decay": { "param_names": [ "backbone.blocks.12.attn.qkv.weight", "backbone.blocks.12.attn.proj.weight", "backbone.blocks.12.mlp.fc1.weight", "backbone.blocks.12.mlp.fc2.weight" ], "lr_scale": 0.1577792147882268, "lr": 6.311168591529072e-06, "weight_decay": 0.05 }, "layer_14_no_decay": { "param_names": [ "backbone.blocks.13.norm1.weight", "backbone.blocks.13.attn.proj.bias", "backbone.blocks.13.attn.q_norm.weight", "backbone.blocks.13.attn.k_norm.weight", "backbone.blocks.13.ls1.gamma", "backbone.blocks.13.norm2.weight", "backbone.blocks.13.mlp.fc1.bias", "backbone.blocks.13.mlp.fc2.bias", "backbone.blocks.13.ls2.gamma" ], "lr_scale": 0.16608338398760716, "lr": 6.6433353595042875e-06, "weight_decay": 0.0 }, "layer_14_decay": { "param_names": [ "backbone.blocks.13.attn.qkv.weight", "backbone.blocks.13.attn.proj.weight", "backbone.blocks.13.mlp.fc1.weight", "backbone.blocks.13.mlp.fc2.weight" ], "lr_scale": 0.16608338398760716, "lr": 6.6433353595042875e-06, "weight_decay": 0.05 }, "layer_15_no_decay": { "param_names": [ "backbone.blocks.14.norm1.weight", "backbone.blocks.14.attn.proj.bias", "backbone.blocks.14.attn.q_norm.weight", "backbone.blocks.14.attn.k_norm.weight", "backbone.blocks.14.ls1.gamma", "backbone.blocks.14.norm2.weight", "backbone.blocks.14.mlp.fc1.bias", "backbone.blocks.14.mlp.fc2.bias", "backbone.blocks.14.ls2.gamma" ], "lr_scale": 0.174824614723797, "lr": 6.9929845889518814e-06, "weight_decay": 0.0 }, "layer_15_decay": { "param_names": [ "backbone.blocks.14.attn.qkv.weight", "backbone.blocks.14.attn.proj.weight", "backbone.blocks.14.mlp.fc1.weight", "backbone.blocks.14.mlp.fc2.weight" ], "lr_scale": 0.174824614723797, "lr": 6.9929845889518814e-06, "weight_decay": 0.05 }, "layer_16_no_decay": { "param_names": [ "backbone.blocks.15.norm1.weight", "backbone.blocks.15.attn.proj.bias", "backbone.blocks.15.attn.q_norm.weight", "backbone.blocks.15.attn.k_norm.weight", "backbone.blocks.15.ls1.gamma", "backbone.blocks.15.norm2.weight", "backbone.blocks.15.mlp.fc1.bias", "backbone.blocks.15.mlp.fc2.bias", "backbone.blocks.15.ls2.gamma" ], "lr_scale": 0.18402591023557582, "lr": 7.361036409423033e-06, "weight_decay": 0.0 }, "layer_16_decay": { "param_names": [ "backbone.blocks.15.attn.qkv.weight", "backbone.blocks.15.attn.proj.weight", "backbone.blocks.15.mlp.fc1.weight", "backbone.blocks.15.mlp.fc2.weight" ], "lr_scale": 0.18402591023557582, "lr": 7.361036409423033e-06, "weight_decay": 0.05 }, "layer_17_no_decay": { "param_names": [ "backbone.blocks.16.norm1.weight", "backbone.blocks.16.attn.proj.bias", "backbone.blocks.16.attn.q_norm.weight", "backbone.blocks.16.attn.k_norm.weight", "backbone.blocks.16.ls1.gamma", "backbone.blocks.16.norm2.weight", "backbone.blocks.16.mlp.fc1.bias", "backbone.blocks.16.mlp.fc2.bias", "backbone.blocks.16.ls2.gamma" ], "lr_scale": 0.19371148445850087, "lr": 7.748459378340036e-06, "weight_decay": 0.0 }, "layer_17_decay": { "param_names": [ "backbone.blocks.16.attn.qkv.weight", "backbone.blocks.16.attn.proj.weight", "backbone.blocks.16.mlp.fc1.weight", "backbone.blocks.16.mlp.fc2.weight" ], "lr_scale": 0.19371148445850087, "lr": 7.748459378340036e-06, "weight_decay": 0.05 }, "layer_18_no_decay": { "param_names": [ "backbone.blocks.17.norm1.weight", "backbone.blocks.17.attn.proj.bias", "backbone.blocks.17.attn.q_norm.weight", "backbone.blocks.17.attn.k_norm.weight", "backbone.blocks.17.ls1.gamma", "backbone.blocks.17.norm2.weight", "backbone.blocks.17.mlp.fc1.bias", "backbone.blocks.17.mlp.fc2.bias", "backbone.blocks.17.ls2.gamma" ], "lr_scale": 0.2039068257457904, "lr": 8.156273029831616e-06, "weight_decay": 0.0 }, "layer_18_decay": { "param_names": [ "backbone.blocks.17.attn.qkv.weight", "backbone.blocks.17.attn.proj.weight", "backbone.blocks.17.mlp.fc1.weight", "backbone.blocks.17.mlp.fc2.weight" ], "lr_scale": 0.2039068257457904, "lr": 8.156273029831616e-06, "weight_decay": 0.05 }, "layer_19_no_decay": { "param_names": [ "backbone.blocks.18.norm1.weight", "backbone.blocks.18.attn.proj.bias", "backbone.blocks.18.attn.q_norm.weight", "backbone.blocks.18.attn.k_norm.weight", "backbone.blocks.18.ls1.gamma", "backbone.blocks.18.norm2.weight", "backbone.blocks.18.mlp.fc1.bias", "backbone.blocks.18.mlp.fc2.bias", "backbone.blocks.18.ls2.gamma" ], "lr_scale": 0.21463876394293727, "lr": 8.585550557717492e-06, "weight_decay": 0.0 }, "layer_19_decay": { "param_names": [ "backbone.blocks.18.attn.qkv.weight", "backbone.blocks.18.attn.proj.weight", "backbone.blocks.18.mlp.fc1.weight", "backbone.blocks.18.mlp.fc2.weight" ], "lr_scale": 0.21463876394293727, "lr": 8.585550557717492e-06, "weight_decay": 0.05 }, "layer_20_no_decay": { "param_names": [ "backbone.blocks.19.norm1.weight", "backbone.blocks.19.attn.proj.bias", "backbone.blocks.19.attn.q_norm.weight", "backbone.blocks.19.attn.k_norm.weight", "backbone.blocks.19.ls1.gamma", "backbone.blocks.19.norm2.weight", "backbone.blocks.19.mlp.fc1.bias", "backbone.blocks.19.mlp.fc2.bias", "backbone.blocks.19.ls2.gamma" ], "lr_scale": 0.22593554099256555, "lr": 9.037421639702623e-06, "weight_decay": 0.0 }, "layer_20_decay": { "param_names": [ "backbone.blocks.19.attn.qkv.weight", "backbone.blocks.19.attn.proj.weight", "backbone.blocks.19.mlp.fc1.weight", "backbone.blocks.19.mlp.fc2.weight" ], "lr_scale": 0.22593554099256555, "lr": 9.037421639702623e-06, "weight_decay": 0.05 }, "layer_21_no_decay": { "param_names": [ "backbone.blocks.20.norm1.weight", "backbone.blocks.20.attn.proj.bias", "backbone.blocks.20.attn.q_norm.weight", "backbone.blocks.20.attn.k_norm.weight", "backbone.blocks.20.ls1.gamma", "backbone.blocks.20.norm2.weight", "backbone.blocks.20.mlp.fc1.bias", "backbone.blocks.20.mlp.fc2.bias", "backbone.blocks.20.ls2.gamma" ], "lr_scale": 0.23782688525533216, "lr": 9.513075410213288e-06, "weight_decay": 0.0 }, "layer_21_decay": { "param_names": [ "backbone.blocks.20.attn.qkv.weight", "backbone.blocks.20.attn.proj.weight", "backbone.blocks.20.mlp.fc1.weight", "backbone.blocks.20.mlp.fc2.weight" ], "lr_scale": 0.23782688525533216, "lr": 9.513075410213288e-06, "weight_decay": 0.05 }, "layer_22_no_decay": { "param_names": [ "backbone.blocks.21.norm1.weight", "backbone.blocks.21.attn.proj.bias", "backbone.blocks.21.attn.q_norm.weight", "backbone.blocks.21.attn.k_norm.weight", "backbone.blocks.21.ls1.gamma", "backbone.blocks.21.norm2.weight", "backbone.blocks.21.mlp.fc1.bias", "backbone.blocks.21.mlp.fc2.bias", "backbone.blocks.21.ls2.gamma" ], "lr_scale": 0.2503440897424549, "lr": 1.0013763589698198e-05, "weight_decay": 0.0 }, "layer_22_decay": { "param_names": [ "backbone.blocks.21.attn.qkv.weight", "backbone.blocks.21.attn.proj.weight", "backbone.blocks.21.mlp.fc1.weight", "backbone.blocks.21.mlp.fc2.weight" ], "lr_scale": 0.2503440897424549, "lr": 1.0013763589698198e-05, "weight_decay": 0.05 }, "layer_23_no_decay": { "param_names": [ "backbone.blocks.22.norm1.weight", "backbone.blocks.22.attn.proj.bias", "backbone.blocks.22.attn.q_norm.weight", "backbone.blocks.22.attn.k_norm.weight", "backbone.blocks.22.ls1.gamma", "backbone.blocks.22.norm2.weight", "backbone.blocks.22.mlp.fc1.bias", "backbone.blocks.22.mlp.fc2.bias", "backbone.blocks.22.ls2.gamma" ], "lr_scale": 0.26352009446574204, "lr": 1.0540803778629682e-05, "weight_decay": 0.0 }, "layer_23_decay": { "param_names": [ "backbone.blocks.22.attn.qkv.weight", "backbone.blocks.22.attn.proj.weight", "backbone.blocks.22.mlp.fc1.weight", "backbone.blocks.22.mlp.fc2.weight" ], "lr_scale": 0.26352009446574204, "lr": 1.0540803778629682e-05, "weight_decay": 0.05 }, "layer_24_no_decay": { "param_names": [ "backbone.blocks.23.norm1.weight", "backbone.blocks.23.attn.proj.bias", "backbone.blocks.23.attn.q_norm.weight", "backbone.blocks.23.attn.k_norm.weight", "backbone.blocks.23.ls1.gamma", "backbone.blocks.23.norm2.weight", "backbone.blocks.23.mlp.fc1.bias", "backbone.blocks.23.mlp.fc2.bias", "backbone.blocks.23.ls2.gamma" ], "lr_scale": 0.27738957312183377, "lr": 1.109558292487335e-05, "weight_decay": 0.0 }, "layer_24_decay": { "param_names": [ "backbone.blocks.23.attn.qkv.weight", "backbone.blocks.23.attn.proj.weight", "backbone.blocks.23.mlp.fc1.weight", "backbone.blocks.23.mlp.fc2.weight" ], "lr_scale": 0.27738957312183377, "lr": 1.109558292487335e-05, "weight_decay": 0.05 }, "layer_25_no_decay": { "param_names": [ "backbone.blocks.24.norm1.weight", "backbone.blocks.24.attn.proj.bias", "backbone.blocks.24.attn.q_norm.weight", "backbone.blocks.24.attn.k_norm.weight", "backbone.blocks.24.ls1.gamma", "backbone.blocks.24.norm2.weight", "backbone.blocks.24.mlp.fc1.bias", "backbone.blocks.24.mlp.fc2.bias", "backbone.blocks.24.ls2.gamma" ], "lr_scale": 0.2919890243387724, "lr": 1.1679560973550896e-05, "weight_decay": 0.0 }, "layer_25_decay": { "param_names": [ "backbone.blocks.24.attn.qkv.weight", "backbone.blocks.24.attn.proj.weight", "backbone.blocks.24.mlp.fc1.weight", "backbone.blocks.24.mlp.fc2.weight" ], "lr_scale": 0.2919890243387724, "lr": 1.1679560973550896e-05, "weight_decay": 0.05 }, "layer_26_no_decay": { "param_names": [ "backbone.blocks.25.norm1.weight", "backbone.blocks.25.attn.proj.bias", "backbone.blocks.25.attn.q_norm.weight", "backbone.blocks.25.attn.k_norm.weight", "backbone.blocks.25.ls1.gamma", "backbone.blocks.25.norm2.weight", "backbone.blocks.25.mlp.fc1.bias", "backbone.blocks.25.mlp.fc2.bias", "backbone.blocks.25.ls2.gamma" ], "lr_scale": 0.3073568677250236, "lr": 1.2294274709000943e-05, "weight_decay": 0.0 }, "layer_26_decay": { "param_names": [ "backbone.blocks.25.attn.qkv.weight", "backbone.blocks.25.attn.proj.weight", "backbone.blocks.25.mlp.fc1.weight", "backbone.blocks.25.mlp.fc2.weight" ], "lr_scale": 0.3073568677250236, "lr": 1.2294274709000943e-05, "weight_decay": 0.05 }, "layer_27_no_decay": { "param_names": [ "backbone.blocks.26.norm1.weight", "backbone.blocks.26.attn.proj.bias", "backbone.blocks.26.attn.q_norm.weight", "backbone.blocks.26.attn.k_norm.weight", "backbone.blocks.26.ls1.gamma", "backbone.blocks.26.norm2.weight", "backbone.blocks.26.mlp.fc1.bias", "backbone.blocks.26.mlp.fc2.bias", "backbone.blocks.26.ls2.gamma" ], "lr_scale": 0.323533544973709, "lr": 1.2941341798948362e-05, "weight_decay": 0.0 }, "layer_27_decay": { "param_names": [ "backbone.blocks.26.attn.qkv.weight", "backbone.blocks.26.attn.proj.weight", "backbone.blocks.26.mlp.fc1.weight", "backbone.blocks.26.mlp.fc2.weight" ], "lr_scale": 0.323533544973709, "lr": 1.2941341798948362e-05, "weight_decay": 0.05 }, "layer_28_no_decay": { "param_names": [ "backbone.blocks.27.norm1.weight", "backbone.blocks.27.attn.proj.bias", "backbone.blocks.27.attn.q_norm.weight", "backbone.blocks.27.attn.k_norm.weight", "backbone.blocks.27.ls1.gamma", "backbone.blocks.27.norm2.weight", "backbone.blocks.27.mlp.fc1.bias", "backbone.blocks.27.mlp.fc2.bias", "backbone.blocks.27.ls2.gamma" ], "lr_scale": 0.3405616262881148, "lr": 1.3622465051524594e-05, "weight_decay": 0.0 }, "layer_28_decay": { "param_names": [ "backbone.blocks.27.attn.qkv.weight", "backbone.blocks.27.attn.proj.weight", "backbone.blocks.27.mlp.fc1.weight", "backbone.blocks.27.mlp.fc2.weight" ], "lr_scale": 0.3405616262881148, "lr": 1.3622465051524594e-05, "weight_decay": 0.05 }, "layer_29_no_decay": { "param_names": [ "backbone.blocks.28.norm1.weight", "backbone.blocks.28.attn.proj.bias", "backbone.blocks.28.attn.q_norm.weight", "backbone.blocks.28.attn.k_norm.weight", "backbone.blocks.28.ls1.gamma", "backbone.blocks.28.norm2.weight", "backbone.blocks.28.mlp.fc1.bias", "backbone.blocks.28.mlp.fc2.bias", "backbone.blocks.28.ls2.gamma" ], "lr_scale": 0.3584859224085419, "lr": 1.4339436896341676e-05, "weight_decay": 0.0 }, "layer_29_decay": { "param_names": [ "backbone.blocks.28.attn.qkv.weight", "backbone.blocks.28.attn.proj.weight", "backbone.blocks.28.mlp.fc1.weight", "backbone.blocks.28.mlp.fc2.weight" ], "lr_scale": 0.3584859224085419, "lr": 1.4339436896341676e-05, "weight_decay": 0.05 }, "layer_30_no_decay": { "param_names": [ "backbone.blocks.29.norm1.weight", "backbone.blocks.29.attn.proj.bias", "backbone.blocks.29.attn.q_norm.weight", "backbone.blocks.29.attn.k_norm.weight", "backbone.blocks.29.ls1.gamma", "backbone.blocks.29.norm2.weight", "backbone.blocks.29.mlp.fc1.bias", "backbone.blocks.29.mlp.fc2.bias", "backbone.blocks.29.ls2.gamma" ], "lr_scale": 0.37735360253530725, "lr": 1.509414410141229e-05, "weight_decay": 0.0 }, "layer_30_decay": { "param_names": [ "backbone.blocks.29.attn.qkv.weight", "backbone.blocks.29.attn.proj.weight", "backbone.blocks.29.mlp.fc1.weight", "backbone.blocks.29.mlp.fc2.weight" ], "lr_scale": 0.37735360253530725, "lr": 1.509414410141229e-05, "weight_decay": 0.05 }, "layer_31_no_decay": { "param_names": [ "backbone.blocks.30.norm1.weight", "backbone.blocks.30.attn.proj.bias", "backbone.blocks.30.attn.q_norm.weight", "backbone.blocks.30.attn.k_norm.weight", "backbone.blocks.30.ls1.gamma", "backbone.blocks.30.norm2.weight", "backbone.blocks.30.mlp.fc1.bias", "backbone.blocks.30.mlp.fc2.bias", "backbone.blocks.30.ls2.gamma" ], "lr_scale": 0.3972143184582182, "lr": 1.588857273832873e-05, "weight_decay": 0.0 }, "layer_31_decay": { "param_names": [ "backbone.blocks.30.attn.qkv.weight", "backbone.blocks.30.attn.proj.weight", "backbone.blocks.30.mlp.fc1.weight", "backbone.blocks.30.mlp.fc2.weight" ], "lr_scale": 0.3972143184582182, "lr": 1.588857273832873e-05, "weight_decay": 0.05 }, "layer_32_no_decay": { "param_names": [ "backbone.blocks.31.norm1.weight", "backbone.blocks.31.attn.proj.bias", "backbone.blocks.31.attn.q_norm.weight", "backbone.blocks.31.attn.k_norm.weight", "backbone.blocks.31.ls1.gamma", "backbone.blocks.31.norm2.weight", "backbone.blocks.31.mlp.fc1.bias", "backbone.blocks.31.mlp.fc2.bias", "backbone.blocks.31.ls2.gamma" ], "lr_scale": 0.4181203352191771, "lr": 1.6724813408767084e-05, "weight_decay": 0.0 }, "layer_32_decay": { "param_names": [ "backbone.blocks.31.attn.qkv.weight", "backbone.blocks.31.attn.proj.weight", "backbone.blocks.31.mlp.fc1.weight", "backbone.blocks.31.mlp.fc2.weight" ], "lr_scale": 0.4181203352191771, "lr": 1.6724813408767084e-05, "weight_decay": 0.05 }, "layer_33_no_decay": { "param_names": [ "backbone.blocks.32.norm1.weight", "backbone.blocks.32.attn.proj.bias", "backbone.blocks.32.attn.q_norm.weight", "backbone.blocks.32.attn.k_norm.weight", "backbone.blocks.32.ls1.gamma", "backbone.blocks.32.norm2.weight", "backbone.blocks.32.mlp.fc1.bias", "backbone.blocks.32.mlp.fc2.bias", "backbone.blocks.32.ls2.gamma" ], "lr_scale": 0.44012666865176536, "lr": 1.7605066746070617e-05, "weight_decay": 0.0 }, "layer_33_decay": { "param_names": [ "backbone.blocks.32.attn.qkv.weight", "backbone.blocks.32.attn.proj.weight", "backbone.blocks.32.mlp.fc1.weight", "backbone.blocks.32.mlp.fc2.weight" ], "lr_scale": 0.44012666865176536, "lr": 1.7605066746070617e-05, "weight_decay": 0.05 }, "layer_34_no_decay": { "param_names": [ "backbone.blocks.33.norm1.weight", "backbone.blocks.33.attn.proj.bias", "backbone.blocks.33.attn.q_norm.weight", "backbone.blocks.33.attn.k_norm.weight", "backbone.blocks.33.ls1.gamma", "backbone.blocks.33.norm2.weight", "backbone.blocks.33.mlp.fc1.bias", "backbone.blocks.33.mlp.fc2.bias", "backbone.blocks.33.ls2.gamma" ], "lr_scale": 0.46329123015975304, "lr": 1.8531649206390123e-05, "weight_decay": 0.0 }, "layer_34_decay": { "param_names": [ "backbone.blocks.33.attn.qkv.weight", "backbone.blocks.33.attn.proj.weight", "backbone.blocks.33.mlp.fc1.weight", "backbone.blocks.33.mlp.fc2.weight" ], "lr_scale": 0.46329123015975304, "lr": 1.8531649206390123e-05, "weight_decay": 0.05 }, "layer_35_no_decay": { "param_names": [ "backbone.blocks.34.norm1.weight", "backbone.blocks.34.attn.proj.bias", "backbone.blocks.34.attn.q_norm.weight", "backbone.blocks.34.attn.k_norm.weight", "backbone.blocks.34.ls1.gamma", "backbone.blocks.34.norm2.weight", "backbone.blocks.34.mlp.fc1.bias", "backbone.blocks.34.mlp.fc2.bias", "backbone.blocks.34.ls2.gamma" ], "lr_scale": 0.48767497911552954, "lr": 1.9506999164621184e-05, "weight_decay": 0.0 }, "layer_35_decay": { "param_names": [ "backbone.blocks.34.attn.qkv.weight", "backbone.blocks.34.attn.proj.weight", "backbone.blocks.34.mlp.fc1.weight", "backbone.blocks.34.mlp.fc2.weight" ], "lr_scale": 0.48767497911552954, "lr": 1.9506999164621184e-05, "weight_decay": 0.05 }, "layer_36_no_decay": { "param_names": [ "backbone.blocks.35.norm1.weight", "backbone.blocks.35.attn.proj.bias", "backbone.blocks.35.attn.q_norm.weight", "backbone.blocks.35.attn.k_norm.weight", "backbone.blocks.35.ls1.gamma", "backbone.blocks.35.norm2.weight", "backbone.blocks.35.mlp.fc1.bias", "backbone.blocks.35.mlp.fc2.bias", "backbone.blocks.35.ls2.gamma" ], "lr_scale": 0.5133420832795048, "lr": 2.0533683331180195e-05, "weight_decay": 0.0 }, "layer_36_decay": { "param_names": [ "backbone.blocks.35.attn.qkv.weight", "backbone.blocks.35.attn.proj.weight", "backbone.blocks.35.mlp.fc1.weight", "backbone.blocks.35.mlp.fc2.weight" ], "lr_scale": 0.5133420832795048, "lr": 2.0533683331180195e-05, "weight_decay": 0.05 }, "layer_37_no_decay": { "param_names": [ "backbone.blocks.36.norm1.weight", "backbone.blocks.36.attn.proj.bias", "backbone.blocks.36.attn.q_norm.weight", "backbone.blocks.36.attn.k_norm.weight", "backbone.blocks.36.ls1.gamma", "backbone.blocks.36.norm2.weight", "backbone.blocks.36.mlp.fc1.bias", "backbone.blocks.36.mlp.fc2.bias", "backbone.blocks.36.ls2.gamma" ], "lr_scale": 0.5403600876626367, "lr": 2.1614403506505468e-05, "weight_decay": 0.0 }, "layer_37_decay": { "param_names": [ "backbone.blocks.36.attn.qkv.weight", "backbone.blocks.36.attn.proj.weight", "backbone.blocks.36.mlp.fc1.weight", "backbone.blocks.36.mlp.fc2.weight" ], "lr_scale": 0.5403600876626367, "lr": 2.1614403506505468e-05, "weight_decay": 0.05 }, "layer_38_no_decay": { "param_names": [ "backbone.blocks.37.norm1.weight", "backbone.blocks.37.attn.proj.bias", "backbone.blocks.37.attn.q_norm.weight", "backbone.blocks.37.attn.k_norm.weight", "backbone.blocks.37.ls1.gamma", "backbone.blocks.37.norm2.weight", "backbone.blocks.37.mlp.fc1.bias", "backbone.blocks.37.mlp.fc2.bias", "backbone.blocks.37.ls2.gamma" ], "lr_scale": 0.5688000922764597, "lr": 2.275200369105839e-05, "weight_decay": 0.0 }, "layer_38_decay": { "param_names": [ "backbone.blocks.37.attn.qkv.weight", "backbone.blocks.37.attn.proj.weight", "backbone.blocks.37.mlp.fc1.weight", "backbone.blocks.37.mlp.fc2.weight" ], "lr_scale": 0.5688000922764597, "lr": 2.275200369105839e-05, "weight_decay": 0.05 }, "layer_39_no_decay": { "param_names": [ "backbone.blocks.38.norm1.weight", "backbone.blocks.38.attn.proj.bias", "backbone.blocks.38.attn.q_norm.weight", "backbone.blocks.38.attn.k_norm.weight", "backbone.blocks.38.ls1.gamma", "backbone.blocks.38.norm2.weight", "backbone.blocks.38.mlp.fc1.bias", "backbone.blocks.38.mlp.fc2.bias", "backbone.blocks.38.ls2.gamma" ], "lr_scale": 0.5987369392383787, "lr": 2.394947756953515e-05, "weight_decay": 0.0 }, "layer_39_decay": { "param_names": [ "backbone.blocks.38.attn.qkv.weight", "backbone.blocks.38.attn.proj.weight", "backbone.blocks.38.mlp.fc1.weight", "backbone.blocks.38.mlp.fc2.weight" ], "lr_scale": 0.5987369392383787, "lr": 2.394947756953515e-05, "weight_decay": 0.05 }, "layer_40_no_decay": { "param_names": [ "backbone.blocks.39.norm1.weight", "backbone.blocks.39.attn.proj.bias", "backbone.blocks.39.attn.q_norm.weight", "backbone.blocks.39.attn.k_norm.weight", "backbone.blocks.39.ls1.gamma", "backbone.blocks.39.norm2.weight", "backbone.blocks.39.mlp.fc1.bias", "backbone.blocks.39.mlp.fc2.bias", "backbone.blocks.39.ls2.gamma" ], "lr_scale": 0.6302494097246091, "lr": 2.5209976388984365e-05, "weight_decay": 0.0 }, "layer_40_decay": { "param_names": [ "backbone.blocks.39.attn.qkv.weight", "backbone.blocks.39.attn.proj.weight", "backbone.blocks.39.mlp.fc1.weight", "backbone.blocks.39.mlp.fc2.weight" ], "lr_scale": 0.6302494097246091, "lr": 2.5209976388984365e-05, "weight_decay": 0.05 }, "layer_41_no_decay": { "param_names": [ "backbone.blocks.40.norm1.weight", "backbone.blocks.40.attn.proj.bias", "backbone.blocks.40.attn.q_norm.weight", "backbone.blocks.40.attn.k_norm.weight", "backbone.blocks.40.ls1.gamma", "backbone.blocks.40.norm2.weight", "backbone.blocks.40.mlp.fc1.bias", "backbone.blocks.40.mlp.fc2.bias", "backbone.blocks.40.ls2.gamma" ], "lr_scale": 0.6634204312890623, "lr": 2.6536817251562493e-05, "weight_decay": 0.0 }, "layer_41_decay": { "param_names": [ "backbone.blocks.40.attn.qkv.weight", "backbone.blocks.40.attn.proj.weight", "backbone.blocks.40.mlp.fc1.weight", "backbone.blocks.40.mlp.fc2.weight" ], "lr_scale": 0.6634204312890623, "lr": 2.6536817251562493e-05, "weight_decay": 0.05 }, "layer_42_no_decay": { "param_names": [ "backbone.blocks.41.norm1.weight", "backbone.blocks.41.attn.proj.bias", "backbone.blocks.41.attn.q_norm.weight", "backbone.blocks.41.attn.k_norm.weight", "backbone.blocks.41.ls1.gamma", "backbone.blocks.41.norm2.weight", "backbone.blocks.41.mlp.fc1.bias", "backbone.blocks.41.mlp.fc2.bias", "backbone.blocks.41.ls2.gamma" ], "lr_scale": 0.6983372960937497, "lr": 2.793349184374999e-05, "weight_decay": 0.0 }, "layer_42_decay": { "param_names": [ "backbone.blocks.41.attn.qkv.weight", "backbone.blocks.41.attn.proj.weight", "backbone.blocks.41.mlp.fc1.weight", "backbone.blocks.41.mlp.fc2.weight" ], "lr_scale": 0.6983372960937497, "lr": 2.793349184374999e-05, "weight_decay": 0.05 }, "layer_43_no_decay": { "param_names": [ "backbone.blocks.42.norm1.weight", "backbone.blocks.42.attn.proj.bias", "backbone.blocks.42.attn.q_norm.weight", "backbone.blocks.42.attn.k_norm.weight", "backbone.blocks.42.ls1.gamma", "backbone.blocks.42.norm2.weight", "backbone.blocks.42.mlp.fc1.bias", "backbone.blocks.42.mlp.fc2.bias", "backbone.blocks.42.ls2.gamma" ], "lr_scale": 0.7350918906249998, "lr": 2.9403675624999993e-05, "weight_decay": 0.0 }, "layer_43_decay": { "param_names": [ "backbone.blocks.42.attn.qkv.weight", "backbone.blocks.42.attn.proj.weight", "backbone.blocks.42.mlp.fc1.weight", "backbone.blocks.42.mlp.fc2.weight" ], "lr_scale": 0.7350918906249998, "lr": 2.9403675624999993e-05, "weight_decay": 0.05 }, "layer_44_no_decay": { "param_names": [ "backbone.blocks.43.norm1.weight", "backbone.blocks.43.attn.proj.bias", "backbone.blocks.43.attn.q_norm.weight", "backbone.blocks.43.attn.k_norm.weight", "backbone.blocks.43.ls1.gamma", "backbone.blocks.43.norm2.weight", "backbone.blocks.43.mlp.fc1.bias", "backbone.blocks.43.mlp.fc2.bias", "backbone.blocks.43.ls2.gamma" ], "lr_scale": 0.7737809374999998, "lr": 3.0951237499999995e-05, "weight_decay": 0.0 }, "layer_44_decay": { "param_names": [ "backbone.blocks.43.attn.qkv.weight", "backbone.blocks.43.attn.proj.weight", "backbone.blocks.43.mlp.fc1.weight", "backbone.blocks.43.mlp.fc2.weight" ], "lr_scale": 0.7737809374999998, "lr": 3.0951237499999995e-05, "weight_decay": 0.05 }, "layer_45_no_decay": { "param_names": [ "backbone.blocks.44.norm1.weight", "backbone.blocks.44.attn.proj.bias", "backbone.blocks.44.attn.q_norm.weight", "backbone.blocks.44.attn.k_norm.weight", "backbone.blocks.44.ls1.gamma", "backbone.blocks.44.norm2.weight", "backbone.blocks.44.mlp.fc1.bias", "backbone.blocks.44.mlp.fc2.bias", "backbone.blocks.44.ls2.gamma" ], "lr_scale": 0.8145062499999999, "lr": 3.258025e-05, "weight_decay": 0.0 }, "layer_45_decay": { "param_names": [ "backbone.blocks.44.attn.qkv.weight", "backbone.blocks.44.attn.proj.weight", "backbone.blocks.44.mlp.fc1.weight", "backbone.blocks.44.mlp.fc2.weight" ], "lr_scale": 0.8145062499999999, "lr": 3.258025e-05, "weight_decay": 0.05 }, "layer_46_no_decay": { "param_names": [ "backbone.blocks.45.norm1.weight", "backbone.blocks.45.attn.proj.bias", "backbone.blocks.45.attn.q_norm.weight", "backbone.blocks.45.attn.k_norm.weight", "backbone.blocks.45.ls1.gamma", "backbone.blocks.45.norm2.weight", "backbone.blocks.45.mlp.fc1.bias", "backbone.blocks.45.mlp.fc2.bias", "backbone.blocks.45.ls2.gamma" ], "lr_scale": 0.8573749999999999, "lr": 3.4294999999999996e-05, "weight_decay": 0.0 }, "layer_46_decay": { "param_names": [ "backbone.blocks.45.attn.qkv.weight", "backbone.blocks.45.attn.proj.weight", "backbone.blocks.45.mlp.fc1.weight", "backbone.blocks.45.mlp.fc2.weight" ], "lr_scale": 0.8573749999999999, "lr": 3.4294999999999996e-05, "weight_decay": 0.05 }, "layer_47_no_decay": { "param_names": [ "backbone.blocks.46.norm1.weight", "backbone.blocks.46.attn.proj.bias", "backbone.blocks.46.attn.q_norm.weight", "backbone.blocks.46.attn.k_norm.weight", "backbone.blocks.46.ls1.gamma", "backbone.blocks.46.norm2.weight", "backbone.blocks.46.mlp.fc1.bias", "backbone.blocks.46.mlp.fc2.bias", "backbone.blocks.46.ls2.gamma" ], "lr_scale": 0.9025, "lr": 3.61e-05, "weight_decay": 0.0 }, "layer_47_decay": { "param_names": [ "backbone.blocks.46.attn.qkv.weight", "backbone.blocks.46.attn.proj.weight", "backbone.blocks.46.mlp.fc1.weight", "backbone.blocks.46.mlp.fc2.weight" ], "lr_scale": 0.9025, "lr": 3.61e-05, "weight_decay": 0.05 }, "layer_48_no_decay": { "param_names": [ "backbone.blocks.47.norm1.weight", "backbone.blocks.47.attn.proj.bias", "backbone.blocks.47.attn.q_norm.weight", "backbone.blocks.47.attn.k_norm.weight", "backbone.blocks.47.ls1.gamma", "backbone.blocks.47.norm2.weight", "backbone.blocks.47.mlp.fc1.bias", "backbone.blocks.47.mlp.fc2.bias", "backbone.blocks.47.ls2.gamma" ], "lr_scale": 0.95, "lr": 3.8e-05, "weight_decay": 0.0 }, "layer_48_decay": { "param_names": [ "backbone.blocks.47.attn.qkv.weight", "backbone.blocks.47.attn.proj.weight", "backbone.blocks.47.mlp.fc1.weight", "backbone.blocks.47.mlp.fc2.weight" ], "lr_scale": 0.95, "lr": 3.8e-05, "weight_decay": 0.05 }, "layer_49_decay": { "param_names": [ "decode_head.conv_seg.weight" ], "lr_scale": 1.0, "lr": 4e-05, "weight_decay": 0.05 }, "layer_49_no_decay": { "param_names": [ "decode_head.conv_seg.bias", "decode_head.norm.weight", "decode_head.norm.bias" ], "lr_scale": 1.0, "lr": 4e-05, "weight_decay": 0.0 } } 2023-11-09 22:37:25,407 - mmseg - INFO - trainable parameters: 5906608150 2023-11-09 22:37:25,409 - mmseg - INFO - total parameters: 5906608150 2023-11-09 22:37:25,453 - mmseg - INFO - Loaded 2000 images 2023-11-09 22:37:25,453 - mmseg - INFO - Start running, host: wangwenhai@SH-IDC1-10-140-37-94, work_dir: /mnt/petrelfs/wangwenhai/workspace/ViTDetection/mmsegmentation/work_dirs/segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16 2023-11-09 22:37:25,454 - mmseg - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) PolyLrUpdaterHook (49 ) ToBFloat16Hook (49 ) ToBFloat16Hook (NORMAL ) DeepspeedCheckpointHook (LOW ) DeepspeedDistEvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- before_train_epoch: (VERY_HIGH ) PolyLrUpdaterHook (LOW ) IterTimerHook (LOW ) DeepspeedDistEvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- before_train_iter: (VERY_HIGH ) PolyLrUpdaterHook (LOW ) IterTimerHook (LOW ) DeepspeedDistEvalHook -------------------- after_train_iter: (ABOVE_NORMAL) OptimizerHook (NORMAL ) DeepspeedCheckpointHook (LOW ) IterTimerHook (LOW ) DeepspeedDistEvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- after_train_epoch: (NORMAL ) DeepspeedCheckpointHook (LOW ) DeepspeedDistEvalHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- before_val_epoch: (LOW ) IterTimerHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- before_val_iter: (LOW ) IterTimerHook -------------------- after_val_iter: (LOW ) IterTimerHook -------------------- after_val_epoch: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- after_run: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardLoggerHook -------------------- 2023-11-09 22:37:25,454 - mmseg - INFO - workflow: [('train', 1)], max: 5000 iters 2023-11-09 22:37:25,461 - mmseg - INFO - Checkpoints will be saved to /mnt/petrelfs/wangwenhai/workspace/ViTDetection/mmsegmentation/work_dirs/segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16 by HardDiskBackend. 2023-11-09 22:39:18,864 - mmseg - INFO - Iter [50/5000] lr: 1.572e-06, eta: 1:46:33, time: 1.292, data_time: 0.009, memory: 38534, decode.loss_ce: 4.0440, decode.acc_seg: 4.9999, loss: 4.0440 2023-11-09 22:40:22,254 - mmseg - INFO - Iter [100/5000] lr: 3.144e-06, eta: 1:44:30, time: 1.268, data_time: 0.051, memory: 38534, decode.loss_ce: 2.4000, decode.acc_seg: 47.4330, loss: 2.4000 2023-11-09 22:41:23,381 - mmseg - INFO - Iter [150/5000] lr: 3.143e-06, eta: 1:41:54, time: 1.223, data_time: 0.007, memory: 38534, decode.loss_ce: 1.2759, decode.acc_seg: 66.3580, loss: 1.2759 2023-11-09 22:42:26,901 - mmseg - INFO - Iter [200/5000] lr: 3.111e-06, eta: 1:41:02, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.9487, decode.acc_seg: 72.2640, loss: 0.9487 2023-11-09 22:43:30,449 - mmseg - INFO - Iter [250/5000] lr: 3.078e-06, eta: 1:40:07, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.8691, decode.acc_seg: 74.2259, loss: 0.8691 2023-11-09 22:44:31,617 - mmseg - INFO - Iter [300/5000] lr: 3.046e-06, eta: 1:38:31, time: 1.223, data_time: 0.007, memory: 38534, decode.loss_ce: 0.7616, decode.acc_seg: 75.9451, loss: 0.7616 2023-11-09 22:45:35,133 - mmseg - INFO - Iter [350/5000] lr: 3.014e-06, eta: 1:37:37, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.6935, decode.acc_seg: 78.1956, loss: 0.6935 2023-11-09 22:46:38,661 - mmseg - INFO - Iter [400/5000] lr: 2.981e-06, eta: 1:36:40, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.6339, decode.acc_seg: 79.1392, loss: 0.6339 2023-11-09 22:47:39,894 - mmseg - INFO - Iter [450/5000] lr: 2.949e-06, eta: 1:35:18, time: 1.225, data_time: 0.007, memory: 38534, decode.loss_ce: 0.5782, decode.acc_seg: 80.9914, loss: 0.5782 2023-11-09 22:48:43,359 - mmseg - INFO - Iter [500/5000] lr: 2.916e-06, eta: 1:34:21, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.5524, decode.acc_seg: 81.9658, loss: 0.5524 2023-11-09 22:49:44,561 - mmseg - INFO - Iter [550/5000] lr: 2.884e-06, eta: 1:33:04, time: 1.224, data_time: 0.007, memory: 38534, decode.loss_ce: 0.5073, decode.acc_seg: 82.5137, loss: 0.5073 2023-11-09 22:50:48,079 - mmseg - INFO - Iter [600/5000] lr: 2.852e-06, eta: 1:32:07, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.4482, decode.acc_seg: 84.7468, loss: 0.4482 2023-11-09 22:51:51,597 - mmseg - INFO - Iter [650/5000] lr: 2.819e-06, eta: 1:31:09, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.4537, decode.acc_seg: 84.9285, loss: 0.4537 2023-11-09 22:52:52,802 - mmseg - INFO - Iter [700/5000] lr: 2.787e-06, eta: 1:29:56, time: 1.224, data_time: 0.007, memory: 38534, decode.loss_ce: 0.4470, decode.acc_seg: 84.7759, loss: 0.4470 2023-11-09 22:53:56,275 - mmseg - INFO - Iter [750/5000] lr: 2.754e-06, eta: 1:28:57, time: 1.269, data_time: 0.054, memory: 38534, decode.loss_ce: 0.4326, decode.acc_seg: 84.7979, loss: 0.4326 2023-11-09 22:54:59,849 - mmseg - INFO - Iter [800/5000] lr: 2.722e-06, eta: 1:27:59, time: 1.271, data_time: 0.054, memory: 38534, decode.loss_ce: 0.3933, decode.acc_seg: 86.0453, loss: 0.3933 2023-11-09 22:56:01,076 - mmseg - INFO - Iter [850/5000] lr: 2.690e-06, eta: 1:26:48, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.4120, decode.acc_seg: 85.9512, loss: 0.4120 2023-11-09 22:57:04,603 - mmseg - INFO - Iter [900/5000] lr: 2.657e-06, eta: 1:25:49, time: 1.271, data_time: 0.050, memory: 38534, decode.loss_ce: 0.3696, decode.acc_seg: 87.1740, loss: 0.3696 2023-11-09 22:58:08,268 - mmseg - INFO - Iter [950/5000] lr: 2.625e-06, eta: 1:24:50, time: 1.273, data_time: 0.053, memory: 38534, decode.loss_ce: 0.3792, decode.acc_seg: 86.0389, loss: 0.3792 2023-11-09 22:59:09,500 - mmseg - INFO - Saving checkpoint at 1000 iterations 2023-11-09 23:00:00,719 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-09 23:00:00,719 - mmseg - INFO - Iter [1000/5000] lr: 2.592e-06, eta: 1:27:05, time: 2.249, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3743, decode.acc_seg: 86.5380, loss: 0.3743 2023-11-09 23:02:32,869 - mmseg - INFO - per class results: 2023-11-09 23:02:32,874 - mmseg - INFO - +---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 76.01 | 86.62 | | building | 81.05 | 92.62 | | sky | 92.76 | 95.94 | | floor | 80.33 | 91.17 | | tree | 73.11 | 88.05 | | ceiling | 82.57 | 91.19 | | road | 81.36 | 85.79 | | bed | 88.3 | 96.37 | | windowpane | 59.16 | 76.04 | | grass | 62.06 | 84.58 | | cabinet | 60.84 | 73.57 | | sidewalk | 62.36 | 83.25 | | person | 78.88 | 90.77 | | earth | 34.18 | 48.85 | | door | 51.87 | 70.98 | | table | 60.02 | 78.56 | | mountain | 49.71 | 59.94 | | plant | 49.72 | 61.42 | | curtain | 71.48 | 83.3 | | chair | 50.73 | 62.97 | | car | 78.18 | 92.87 | | water | 54.83 | 75.58 | | painting | 71.4 | 84.72 | | sofa | 62.98 | 71.65 | | shelf | 30.38 | 50.55 | | house | 15.49 | 17.67 | | sea | 50.96 | 59.34 | | mirror | 66.35 | 83.14 | | rug | 62.31 | 67.39 | | field | 25.91 | 42.21 | | armchair | 39.42 | 77.02 | | seat | 52.62 | 76.12 | | fence | 30.48 | 44.06 | | desk | 39.69 | 63.05 | | rock | 45.29 | 69.01 | | wardrobe | 35.09 | 44.37 | | lamp | 59.3 | 71.33 | | bathtub | 78.04 | 84.47 | | railing | 37.86 | 53.32 | | cushion | 58.94 | 68.96 | | base | 21.16 | 40.29 | | box | 23.34 | 27.04 | | column | 43.94 | 60.98 | | signboard | 33.16 | 52.98 | | chest of drawers | 33.85 | 67.77 | | counter | 30.07 | 41.85 | | sand | 57.18 | 83.71 | | sink | 70.04 | 77.99 | | skyscraper | 41.72 | 50.03 | | fireplace | 72.16 | 85.32 | | refrigerator | 70.87 | 79.04 | | grandstand | 7.04 | 8.16 | | path | 14.55 | 19.14 | | stairs | 39.77 | 51.14 | | runway | 72.61 | 90.3 | | case | 14.74 | 19.45 | | pool table | 91.19 | 96.37 | | pillow | 51.65 | 57.72 | | screen door | 72.1 | 78.86 | | stairway | 47.34 | 68.79 | | river | 14.79 | 33.04 | | bridge | 67.81 | 81.83 | | bookcase | 27.76 | 40.46 | | blind | 5.59 | 6.33 | | coffee table | 60.63 | 82.61 | | toilet | 82.16 | 92.56 | | flower | 33.99 | 54.98 | | book | 44.18 | 67.92 | | hill | 6.24 | 7.93 | | bench | 49.04 | 54.55 | | countertop | 58.34 | 70.65 | | stove | 71.67 | 84.96 | | palm | 46.42 | 75.07 | | kitchen island | 43.63 | 77.05 | | computer | 65.16 | 75.52 | | swivel chair | 38.01 | 70.25 | | boat | 45.56 | 75.22 | | bar | 39.05 | 56.72 | | arcade machine | 79.41 | 83.58 | | hovel | 14.51 | 20.55 | | bus | 87.88 | 92.86 | | towel | 70.51 | 83.39 | | light | 38.42 | 48.94 | | truck | 31.81 | 36.83 | | tower | 15.58 | 28.1 | | chandelier | 60.37 | 76.97 | | awning | 25.74 | 37.09 | | streetlight | 20.62 | 31.28 | | booth | 26.86 | 27.39 | | television receiver | 70.94 | 81.09 | | airplane | 57.17 | 64.48 | | dirt track | 15.36 | 25.38 | | apparel | 38.78 | 85.79 | | pole | 14.97 | 18.58 | | land | 0.0 | 0.0 | | bannister | 7.45 | 10.86 | | escalator | 51.2 | 65.68 | | ottoman | 48.0 | 70.92 | | bottle | 14.53 | 16.76 | | buffet | 41.15 | 64.94 | | poster | 25.23 | 35.18 | | stage | 8.65 | 15.65 | | van | 0.0 | 0.0 | | ship | 0.0 | 0.0 | | fountain | 32.24 | 33.52 | | conveyer belt | 85.01 | 91.41 | | canopy | 53.95 | 66.42 | | washer | 75.59 | 78.07 | | plaything | 33.14 | 48.63 | | swimming pool | 39.59 | 39.59 | | stool | 31.1 | 38.33 | | barrel | 20.46 | 20.62 | | basket | 32.62 | 47.58 | | waterfall | 49.59 | 76.54 | | tent | 0.0 | 0.0 | | bag | 11.58 | 13.38 | | minibike | 66.75 | 75.33 | | cradle | 74.83 | 97.14 | | oven | 10.85 | 11.85 | | ball | 36.31 | 68.06 | | food | 8.34 | 8.4 | | step | 10.16 | 11.31 | | tank | 32.65 | 33.68 | | trade name | 23.73 | 28.04 | | microwave | 74.19 | 92.3 | | pot | 49.22 | 57.31 | | animal | 58.91 | 61.5 | | bicycle | 57.48 | 77.05 | | lake | 0.0 | 0.0 | | dishwasher | 56.48 | 78.87 | | screen | 65.05 | 84.63 | | blanket | 9.18 | 9.85 | | sculpture | 30.55 | 31.06 | | hood | 59.74 | 65.19 | | sconce | 30.91 | 41.56 | | vase | 31.37 | 51.62 | | traffic light | 28.59 | 36.51 | | tray | 9.86 | 21.99 | | ashcan | 43.18 | 55.68 | | fan | 50.16 | 59.79 | | pier | 28.65 | 29.15 | | crt screen | 5.8 | 6.94 | | plate | 50.77 | 73.55 | | monitor | 3.37 | 3.55 | | bulletin board | 31.81 | 38.95 | | shower | 0.0 | 0.0 | | radiator | 63.34 | 72.21 | | glass | 15.23 | 16.27 | | clock | 30.35 | 31.7 | | flag | 66.79 | 70.58 | +---------------------+-------+-------+ 2023-11-09 23:02:32,874 - mmseg - INFO - Summary: 2023-11-09 23:02:32,875 - mmseg - INFO - +-------+-------+-------+ | aAcc | mIoU | mAcc | +-------+-------+-------+ | 81.22 | 43.97 | 55.61 | +-------+-------+-------+ 2023-11-09 23:02:32,875 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-09 23:02:32,876 - mmseg - INFO - Iter(val) [250] aAcc: 0.8122, mIoU: 0.4397, mAcc: 0.5561, IoU.wall: 0.7601, IoU.building: 0.8105, IoU.sky: 0.9276, IoU.floor: 0.8033, IoU.tree: 0.7311, IoU.ceiling: 0.8257, IoU.road: 0.8136, IoU.bed : 0.8830, IoU.windowpane: 0.5916, IoU.grass: 0.6206, IoU.cabinet: 0.6084, IoU.sidewalk: 0.6236, IoU.person: 0.7888, IoU.earth: 0.3418, IoU.door: 0.5187, IoU.table: 0.6002, IoU.mountain: 0.4971, IoU.plant: 0.4972, IoU.curtain: 0.7148, IoU.chair: 0.5073, IoU.car: 0.7818, IoU.water: 0.5483, IoU.painting: 0.7140, IoU.sofa: 0.6298, IoU.shelf: 0.3038, IoU.house: 0.1549, IoU.sea: 0.5096, IoU.mirror: 0.6635, IoU.rug: 0.6231, IoU.field: 0.2591, IoU.armchair: 0.3942, IoU.seat: 0.5262, IoU.fence: 0.3048, IoU.desk: 0.3969, IoU.rock: 0.4529, IoU.wardrobe: 0.3509, IoU.lamp: 0.5930, IoU.bathtub: 0.7804, IoU.railing: 0.3786, IoU.cushion: 0.5894, IoU.base: 0.2116, IoU.box: 0.2334, IoU.column: 0.4394, IoU.signboard: 0.3316, IoU.chest of drawers: 0.3385, IoU.counter: 0.3007, IoU.sand: 0.5718, IoU.sink: 0.7004, IoU.skyscraper: 0.4172, IoU.fireplace: 0.7216, IoU.refrigerator: 0.7087, IoU.grandstand: 0.0704, IoU.path: 0.1455, IoU.stairs: 0.3977, IoU.runway: 0.7261, IoU.case: 0.1474, IoU.pool table: 0.9119, IoU.pillow: 0.5165, IoU.screen door: 0.7210, IoU.stairway: 0.4734, IoU.river: 0.1479, IoU.bridge: 0.6781, IoU.bookcase: 0.2776, IoU.blind: 0.0559, IoU.coffee table: 0.6063, IoU.toilet: 0.8216, IoU.flower: 0.3399, IoU.book: 0.4418, IoU.hill: 0.0624, IoU.bench: 0.4904, IoU.countertop: 0.5834, IoU.stove: 0.7167, IoU.palm: 0.4642, IoU.kitchen island: 0.4363, IoU.computer: 0.6516, IoU.swivel chair: 0.3801, IoU.boat: 0.4556, IoU.bar: 0.3905, IoU.arcade machine: 0.7941, IoU.hovel: 0.1451, IoU.bus: 0.8788, IoU.towel: 0.7051, IoU.light: 0.3842, IoU.truck: 0.3181, IoU.tower: 0.1558, IoU.chandelier: 0.6037, IoU.awning: 0.2574, IoU.streetlight: 0.2062, IoU.booth: 0.2686, IoU.television receiver: 0.7094, IoU.airplane: 0.5717, IoU.dirt track: 0.1536, IoU.apparel: 0.3878, IoU.pole: 0.1497, IoU.land: 0.0000, IoU.bannister: 0.0745, IoU.escalator: 0.5120, IoU.ottoman: 0.4800, IoU.bottle: 0.1453, IoU.buffet: 0.4115, IoU.poster: 0.2523, IoU.stage: 0.0865, IoU.van: 0.0000, IoU.ship: 0.0000, IoU.fountain: 0.3224, IoU.conveyer belt: 0.8501, IoU.canopy: 0.5395, IoU.washer: 0.7559, IoU.plaything: 0.3314, IoU.swimming pool: 0.3959, IoU.stool: 0.3110, IoU.barrel: 0.2046, IoU.basket: 0.3262, IoU.waterfall: 0.4959, IoU.tent: 0.0000, IoU.bag: 0.1158, IoU.minibike: 0.6675, IoU.cradle: 0.7483, IoU.oven: 0.1085, IoU.ball: 0.3631, IoU.food: 0.0834, IoU.step: 0.1016, IoU.tank: 0.3265, IoU.trade name: 0.2373, IoU.microwave: 0.7419, IoU.pot: 0.4922, IoU.animal: 0.5891, IoU.bicycle: 0.5748, IoU.lake: 0.0000, IoU.dishwasher: 0.5648, IoU.screen: 0.6505, IoU.blanket: 0.0918, IoU.sculpture: 0.3055, IoU.hood: 0.5974, IoU.sconce: 0.3091, IoU.vase: 0.3137, IoU.traffic light: 0.2859, IoU.tray: 0.0986, IoU.ashcan: 0.4318, IoU.fan: 0.5016, IoU.pier: 0.2865, IoU.crt screen: 0.0580, IoU.plate: 0.5077, IoU.monitor: 0.0337, IoU.bulletin board: 0.3181, IoU.shower: 0.0000, IoU.radiator: 0.6334, IoU.glass: 0.1523, IoU.clock: 0.3035, IoU.flag: 0.6679, Acc.wall: 0.8662, Acc.building: 0.9262, Acc.sky: 0.9594, Acc.floor: 0.9117, Acc.tree: 0.8805, Acc.ceiling: 0.9119, Acc.road: 0.8579, Acc.bed : 0.9637, Acc.windowpane: 0.7604, Acc.grass: 0.8458, Acc.cabinet: 0.7357, Acc.sidewalk: 0.8325, Acc.person: 0.9077, Acc.earth: 0.4885, Acc.door: 0.7098, Acc.table: 0.7856, Acc.mountain: 0.5994, Acc.plant: 0.6142, Acc.curtain: 0.8330, Acc.chair: 0.6297, Acc.car: 0.9287, Acc.water: 0.7558, Acc.painting: 0.8472, Acc.sofa: 0.7165, Acc.shelf: 0.5055, Acc.house: 0.1767, Acc.sea: 0.5934, Acc.mirror: 0.8314, Acc.rug: 0.6739, Acc.field: 0.4221, Acc.armchair: 0.7702, Acc.seat: 0.7612, Acc.fence: 0.4406, Acc.desk: 0.6305, Acc.rock: 0.6901, Acc.wardrobe: 0.4437, Acc.lamp: 0.7133, Acc.bathtub: 0.8447, Acc.railing: 0.5332, Acc.cushion: 0.6896, Acc.base: 0.4029, Acc.box: 0.2704, Acc.column: 0.6098, Acc.signboard: 0.5298, Acc.chest of drawers: 0.6777, Acc.counter: 0.4185, Acc.sand: 0.8371, Acc.sink: 0.7799, Acc.skyscraper: 0.5003, Acc.fireplace: 0.8532, Acc.refrigerator: 0.7904, Acc.grandstand: 0.0816, Acc.path: 0.1914, Acc.stairs: 0.5114, Acc.runway: 0.9030, Acc.case: 0.1945, Acc.pool table: 0.9637, Acc.pillow: 0.5772, Acc.screen door: 0.7886, Acc.stairway: 0.6879, Acc.river: 0.3304, Acc.bridge: 0.8183, Acc.bookcase: 0.4046, Acc.blind: 0.0633, Acc.coffee table: 0.8261, Acc.toilet: 0.9256, Acc.flower: 0.5498, Acc.book: 0.6792, Acc.hill: 0.0793, Acc.bench: 0.5455, Acc.countertop: 0.7065, Acc.stove: 0.8496, Acc.palm: 0.7507, Acc.kitchen island: 0.7705, Acc.computer: 0.7552, Acc.swivel chair: 0.7025, Acc.boat: 0.7522, Acc.bar: 0.5672, Acc.arcade machine: 0.8358, Acc.hovel: 0.2055, Acc.bus: 0.9286, Acc.towel: 0.8339, Acc.light: 0.4894, Acc.truck: 0.3683, Acc.tower: 0.2810, Acc.chandelier: 0.7697, Acc.awning: 0.3709, Acc.streetlight: 0.3128, Acc.booth: 0.2739, Acc.television receiver: 0.8109, Acc.airplane: 0.6448, Acc.dirt track: 0.2538, Acc.apparel: 0.8579, Acc.pole: 0.1858, Acc.land: 0.0000, Acc.bannister: 0.1086, Acc.escalator: 0.6568, Acc.ottoman: 0.7092, Acc.bottle: 0.1676, Acc.buffet: 0.6494, Acc.poster: 0.3518, Acc.stage: 0.1565, Acc.van: 0.0000, Acc.ship: 0.0000, Acc.fountain: 0.3352, Acc.conveyer belt: 0.9141, Acc.canopy: 0.6642, Acc.washer: 0.7807, Acc.plaything: 0.4863, Acc.swimming pool: 0.3959, Acc.stool: 0.3833, Acc.barrel: 0.2062, Acc.basket: 0.4758, Acc.waterfall: 0.7654, Acc.tent: 0.0000, Acc.bag: 0.1338, Acc.minibike: 0.7533, Acc.cradle: 0.9714, Acc.oven: 0.1185, Acc.ball: 0.6806, Acc.food: 0.0840, Acc.step: 0.1131, Acc.tank: 0.3368, Acc.trade name: 0.2804, Acc.microwave: 0.9230, Acc.pot: 0.5731, Acc.animal: 0.6150, Acc.bicycle: 0.7705, Acc.lake: 0.0000, Acc.dishwasher: 0.7887, Acc.screen: 0.8463, Acc.blanket: 0.0985, Acc.sculpture: 0.3106, Acc.hood: 0.6519, Acc.sconce: 0.4156, Acc.vase: 0.5162, Acc.traffic light: 0.3651, Acc.tray: 0.2199, Acc.ashcan: 0.5568, Acc.fan: 0.5979, Acc.pier: 0.2915, Acc.crt screen: 0.0694, Acc.plate: 0.7355, Acc.monitor: 0.0355, Acc.bulletin board: 0.3895, Acc.shower: 0.0000, Acc.radiator: 0.7221, Acc.glass: 0.1627, Acc.clock: 0.3170, Acc.flag: 0.7058 2023-11-09 23:03:36,506 - mmseg - INFO - Iter [1050/5000] lr: 2.560e-06, eta: 1:35:26, time: 4.316, data_time: 3.097, memory: 38534, decode.loss_ce: 0.3789, decode.acc_seg: 87.0747, loss: 0.3789 2023-11-09 23:04:37,693 - mmseg - INFO - Iter [1100/5000] lr: 2.528e-06, eta: 1:33:33, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3808, decode.acc_seg: 86.7820, loss: 0.3808 2023-11-09 23:05:41,158 - mmseg - INFO - Iter [1150/5000] lr: 2.495e-06, eta: 1:31:53, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.3509, decode.acc_seg: 87.4624, loss: 0.3509 2023-11-09 23:06:44,731 - mmseg - INFO - Iter [1200/5000] lr: 2.463e-06, eta: 1:30:16, time: 1.271, data_time: 0.050, memory: 38534, decode.loss_ce: 0.3365, decode.acc_seg: 88.0465, loss: 0.3365 2023-11-09 23:07:45,952 - mmseg - INFO - Iter [1250/5000] lr: 2.430e-06, eta: 1:28:34, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3352, decode.acc_seg: 88.0570, loss: 0.3352 2023-11-09 23:08:49,561 - mmseg - INFO - Iter [1300/5000] lr: 2.398e-06, eta: 1:27:03, time: 1.272, data_time: 0.054, memory: 38534, decode.loss_ce: 0.3128, decode.acc_seg: 88.7493, loss: 0.3128 2023-11-09 23:09:53,145 - mmseg - INFO - Iter [1350/5000] lr: 2.366e-06, eta: 1:25:33, time: 1.272, data_time: 0.052, memory: 38534, decode.loss_ce: 0.3241, decode.acc_seg: 88.2304, loss: 0.3241 2023-11-09 23:10:54,427 - mmseg - INFO - Iter [1400/5000] lr: 2.333e-06, eta: 1:24:00, time: 1.226, data_time: 0.008, memory: 38534, decode.loss_ce: 0.3072, decode.acc_seg: 88.7156, loss: 0.3072 2023-11-09 23:11:57,947 - mmseg - INFO - Iter [1450/5000] lr: 2.301e-06, eta: 1:22:34, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2948, decode.acc_seg: 89.1791, loss: 0.2948 2023-11-09 23:12:59,205 - mmseg - INFO - Iter [1500/5000] lr: 2.268e-06, eta: 1:21:04, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2989, decode.acc_seg: 89.1917, loss: 0.2989 2023-11-09 23:14:02,895 - mmseg - INFO - Iter [1550/5000] lr: 2.236e-06, eta: 1:19:42, time: 1.274, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2721, decode.acc_seg: 89.9743, loss: 0.2721 2023-11-09 23:15:06,503 - mmseg - INFO - Iter [1600/5000] lr: 2.204e-06, eta: 1:18:20, time: 1.272, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2817, decode.acc_seg: 89.6098, loss: 0.2817 2023-11-09 23:16:07,820 - mmseg - INFO - Iter [1650/5000] lr: 2.171e-06, eta: 1:16:55, time: 1.226, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2712, decode.acc_seg: 89.8701, loss: 0.2712 2023-11-09 23:17:11,349 - mmseg - INFO - Iter [1700/5000] lr: 2.139e-06, eta: 1:15:36, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2808, decode.acc_seg: 89.5798, loss: 0.2808 2023-11-09 23:18:14,837 - mmseg - INFO - Iter [1750/5000] lr: 2.107e-06, eta: 1:14:18, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2725, decode.acc_seg: 89.6531, loss: 0.2725 2023-11-09 23:19:16,050 - mmseg - INFO - Iter [1800/5000] lr: 2.074e-06, eta: 1:12:56, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2814, decode.acc_seg: 89.8453, loss: 0.2814 2023-11-09 23:20:19,638 - mmseg - INFO - Iter [1850/5000] lr: 2.042e-06, eta: 1:11:39, time: 1.272, data_time: 0.054, memory: 38534, decode.loss_ce: 0.2578, decode.acc_seg: 90.5057, loss: 0.2578 2023-11-09 23:21:23,335 - mmseg - INFO - Iter [1900/5000] lr: 2.009e-06, eta: 1:10:24, time: 1.274, data_time: 0.054, memory: 38534, decode.loss_ce: 0.2555, decode.acc_seg: 90.4556, loss: 0.2555 2023-11-09 23:22:24,580 - mmseg - INFO - Iter [1950/5000] lr: 1.977e-06, eta: 1:09:05, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2619, decode.acc_seg: 90.5555, loss: 0.2619 2023-11-09 23:23:28,063 - mmseg - INFO - Saving checkpoint at 2000 iterations 2023-11-09 23:24:18,842 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-09 23:24:18,843 - mmseg - INFO - Iter [2000/5000] lr: 1.945e-06, eta: 1:09:06, time: 2.285, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2430, decode.acc_seg: 90.6814, loss: 0.2430 2023-11-09 23:25:12,740 - mmseg - INFO - per class results: 2023-11-09 23:25:12,745 - mmseg - INFO - +---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 76.91 | 87.22 | | building | 81.73 | 90.47 | | sky | 93.29 | 96.97 | | floor | 80.89 | 87.87 | | tree | 73.5 | 88.3 | | ceiling | 83.22 | 93.68 | | road | 82.15 | 88.46 | | bed | 89.39 | 95.78 | | windowpane | 61.7 | 78.79 | | grass | 63.71 | 79.76 | | cabinet | 60.66 | 69.41 | | sidewalk | 63.93 | 81.23 | | person | 78.75 | 93.59 | | earth | 34.28 | 47.5 | | door | 53.04 | 64.49 | | table | 60.92 | 78.99 | | mountain | 52.5 | 62.26 | | plant | 50.55 | 61.72 | | curtain | 72.88 | 87.23 | | chair | 53.18 | 67.42 | | car | 80.48 | 93.52 | | water | 51.04 | 67.49 | | painting | 72.61 | 88.3 | | sofa | 69.55 | 87.04 | | shelf | 34.57 | 56.73 | | house | 34.85 | 49.25 | | sea | 53.04 | 68.83 | | mirror | 70.6 | 78.69 | | rug | 66.14 | 81.18 | | field | 33.09 | 64.77 | | armchair | 45.11 | 68.72 | | seat | 48.49 | 69.59 | | fence | 26.61 | 33.76 | | desk | 41.35 | 65.86 | | rock | 49.64 | 74.2 | | wardrobe | 33.82 | 42.33 | | lamp | 58.58 | 74.33 | | bathtub | 78.87 | 84.36 | | railing | 38.16 | 59.21 | | cushion | 59.24 | 77.11 | | base | 23.52 | 33.88 | | box | 25.44 | 28.95 | | column | 49.54 | 66.4 | | signboard | 32.21 | 49.21 | | chest of drawers | 37.97 | 67.29 | | counter | 32.86 | 45.2 | | sand | 50.26 | 87.11 | | sink | 73.8 | 81.88 | | skyscraper | 50.53 | 75.58 | | fireplace | 70.7 | 88.74 | | refrigerator | 71.47 | 93.14 | | grandstand | 10.12 | 12.22 | | path | 18.64 | 29.18 | | stairs | 40.03 | 53.74 | | runway | 75.27 | 88.48 | | case | 37.13 | 49.74 | | pool table | 90.82 | 96.29 | | pillow | 52.49 | 58.78 | | screen door | 77.36 | 82.19 | | stairway | 45.23 | 75.85 | | river | 16.8 | 43.56 | | bridge | 69.48 | 83.59 | | bookcase | 25.99 | 41.66 | | blind | 28.79 | 36.05 | | coffee table | 61.46 | 83.34 | | toilet | 84.72 | 92.1 | | flower | 35.62 | 58.55 | | book | 42.66 | 71.85 | | hill | 7.87 | 12.06 | | bench | 46.73 | 64.51 | | countertop | 56.22 | 70.75 | | stove | 73.85 | 86.25 | | palm | 49.29 | 76.44 | | kitchen island | 45.38 | 91.17 | | computer | 65.52 | 74.72 | | swivel chair | 40.52 | 63.08 | | boat | 63.08 | 79.96 | | bar | 40.38 | 61.65 | | arcade machine | 57.39 | 61.67 | | hovel | 13.17 | 21.93 | | bus | 90.53 | 94.2 | | towel | 71.01 | 85.2 | | light | 40.88 | 52.06 | | truck | 33.1 | 39.58 | | tower | 10.52 | 18.81 | | chandelier | 59.95 | 83.7 | | awning | 29.6 | 42.34 | | streetlight | 23.31 | 34.41 | | booth | 19.16 | 27.55 | | television receiver | 72.5 | 86.18 | | airplane | 58.81 | 64.45 | | dirt track | 21.38 | 30.3 | | apparel | 43.69 | 63.89 | | pole | 17.41 | 22.24 | | land | 0.0 | 0.0 | | bannister | 6.83 | 8.65 | | escalator | 62.08 | 81.25 | | ottoman | 49.33 | 65.83 | | bottle | 19.99 | 27.08 | | buffet | 44.5 | 66.99 | | poster | 25.17 | 33.9 | | stage | 9.75 | 20.47 | | van | 8.18 | 9.97 | | ship | 0.0 | 0.0 | | fountain | 21.28 | 21.62 | | conveyer belt | 84.46 | 93.08 | | canopy | 42.64 | 48.55 | | washer | 82.39 | 85.81 | | plaything | 33.46 | 66.06 | | swimming pool | 58.5 | 58.58 | | stool | 31.25 | 38.34 | | barrel | 21.99 | 22.35 | | basket | 33.57 | 40.62 | | waterfall | 49.69 | 93.81 | | tent | 0.0 | 0.0 | | bag | 16.87 | 21.48 | | minibike | 71.07 | 87.08 | | cradle | 74.92 | 98.42 | | oven | 47.3 | 61.18 | | ball | 36.49 | 69.78 | | food | 26.24 | 26.74 | | step | 15.11 | 19.35 | | tank | 31.22 | 32.1 | | trade name | 30.55 | 44.5 | | microwave | 78.74 | 89.15 | | pot | 52.56 | 60.16 | | animal | 62.12 | 65.34 | | bicycle | 58.5 | 77.33 | | lake | 0.0 | 0.0 | | dishwasher | 66.74 | 79.38 | | screen | 47.57 | 58.37 | | blanket | 14.8 | 17.04 | | sculpture | 44.81 | 52.18 | | hood | 61.26 | 66.98 | | sconce | 35.8 | 49.37 | | vase | 34.85 | 56.47 | | traffic light | 28.46 | 39.94 | | tray | 10.58 | 30.08 | | ashcan | 44.53 | 57.55 | | fan | 52.0 | 59.69 | | pier | 30.96 | 32.83 | | crt screen | 14.08 | 30.94 | | plate | 53.68 | 69.21 | | monitor | 2.98 | 3.35 | | bulletin board | 33.54 | 45.77 | | shower | 0.0 | 0.0 | | radiator | 64.69 | 70.02 | | glass | 16.08 | 16.93 | | clock | 37.95 | 44.63 | | flag | 67.71 | 72.89 | +---------------------+-------+-------+ 2023-11-09 23:25:12,746 - mmseg - INFO - Summary: 2023-11-09 23:25:12,746 - mmseg - INFO - +------+------+-------+ | aAcc | mIoU | mAcc | +------+------+-------+ | 82.0 | 46.3 | 59.06 | +------+------+-------+ 2023-11-09 23:25:12,746 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-09 23:25:12,747 - mmseg - INFO - Iter(val) [250] aAcc: 0.8200, mIoU: 0.4630, mAcc: 0.5906, IoU.wall: 0.7691, IoU.building: 0.8173, IoU.sky: 0.9329, IoU.floor: 0.8089, IoU.tree: 0.7350, IoU.ceiling: 0.8322, IoU.road: 0.8215, IoU.bed : 0.8939, IoU.windowpane: 0.6170, IoU.grass: 0.6371, IoU.cabinet: 0.6066, IoU.sidewalk: 0.6393, IoU.person: 0.7875, IoU.earth: 0.3428, IoU.door: 0.5304, IoU.table: 0.6092, IoU.mountain: 0.5250, IoU.plant: 0.5055, IoU.curtain: 0.7288, IoU.chair: 0.5318, IoU.car: 0.8048, IoU.water: 0.5104, IoU.painting: 0.7261, IoU.sofa: 0.6955, IoU.shelf: 0.3457, IoU.house: 0.3485, IoU.sea: 0.5304, IoU.mirror: 0.7060, IoU.rug: 0.6614, IoU.field: 0.3309, IoU.armchair: 0.4511, IoU.seat: 0.4849, IoU.fence: 0.2661, IoU.desk: 0.4135, IoU.rock: 0.4964, IoU.wardrobe: 0.3382, IoU.lamp: 0.5858, IoU.bathtub: 0.7887, IoU.railing: 0.3816, IoU.cushion: 0.5924, IoU.base: 0.2352, IoU.box: 0.2544, IoU.column: 0.4954, IoU.signboard: 0.3221, IoU.chest of drawers: 0.3797, IoU.counter: 0.3286, IoU.sand: 0.5026, IoU.sink: 0.7380, IoU.skyscraper: 0.5053, IoU.fireplace: 0.7070, IoU.refrigerator: 0.7147, IoU.grandstand: 0.1012, IoU.path: 0.1864, IoU.stairs: 0.4003, IoU.runway: 0.7527, IoU.case: 0.3713, IoU.pool table: 0.9082, IoU.pillow: 0.5249, IoU.screen door: 0.7736, IoU.stairway: 0.4523, IoU.river: 0.1680, IoU.bridge: 0.6948, IoU.bookcase: 0.2599, IoU.blind: 0.2879, IoU.coffee table: 0.6146, IoU.toilet: 0.8472, IoU.flower: 0.3562, IoU.book: 0.4266, IoU.hill: 0.0787, IoU.bench: 0.4673, IoU.countertop: 0.5622, IoU.stove: 0.7385, IoU.palm: 0.4929, IoU.kitchen island: 0.4538, IoU.computer: 0.6552, IoU.swivel chair: 0.4052, IoU.boat: 0.6308, IoU.bar: 0.4038, IoU.arcade machine: 0.5739, IoU.hovel: 0.1317, IoU.bus: 0.9053, IoU.towel: 0.7101, IoU.light: 0.4088, IoU.truck: 0.3310, IoU.tower: 0.1052, IoU.chandelier: 0.5995, IoU.awning: 0.2960, IoU.streetlight: 0.2331, IoU.booth: 0.1916, IoU.television receiver: 0.7250, IoU.airplane: 0.5881, IoU.dirt track: 0.2138, IoU.apparel: 0.4369, IoU.pole: 0.1741, IoU.land: 0.0000, IoU.bannister: 0.0683, IoU.escalator: 0.6208, IoU.ottoman: 0.4933, IoU.bottle: 0.1999, IoU.buffet: 0.4450, IoU.poster: 0.2517, IoU.stage: 0.0975, IoU.van: 0.0818, IoU.ship: 0.0000, IoU.fountain: 0.2128, IoU.conveyer belt: 0.8446, IoU.canopy: 0.4264, IoU.washer: 0.8239, IoU.plaything: 0.3346, IoU.swimming pool: 0.5850, IoU.stool: 0.3125, IoU.barrel: 0.2199, IoU.basket: 0.3357, IoU.waterfall: 0.4969, IoU.tent: 0.0000, IoU.bag: 0.1687, IoU.minibike: 0.7107, IoU.cradle: 0.7492, IoU.oven: 0.4730, IoU.ball: 0.3649, IoU.food: 0.2624, IoU.step: 0.1511, IoU.tank: 0.3122, IoU.trade name: 0.3055, IoU.microwave: 0.7874, IoU.pot: 0.5256, IoU.animal: 0.6212, IoU.bicycle: 0.5850, IoU.lake: 0.0000, IoU.dishwasher: 0.6674, IoU.screen: 0.4757, IoU.blanket: 0.1480, IoU.sculpture: 0.4481, IoU.hood: 0.6126, IoU.sconce: 0.3580, IoU.vase: 0.3485, IoU.traffic light: 0.2846, IoU.tray: 0.1058, IoU.ashcan: 0.4453, IoU.fan: 0.5200, IoU.pier: 0.3096, IoU.crt screen: 0.1408, IoU.plate: 0.5368, IoU.monitor: 0.0298, IoU.bulletin board: 0.3354, IoU.shower: 0.0000, IoU.radiator: 0.6469, IoU.glass: 0.1608, IoU.clock: 0.3795, IoU.flag: 0.6771, Acc.wall: 0.8722, Acc.building: 0.9047, Acc.sky: 0.9697, Acc.floor: 0.8787, Acc.tree: 0.8830, Acc.ceiling: 0.9368, Acc.road: 0.8846, Acc.bed : 0.9578, Acc.windowpane: 0.7879, Acc.grass: 0.7976, Acc.cabinet: 0.6941, Acc.sidewalk: 0.8123, Acc.person: 0.9359, Acc.earth: 0.4750, Acc.door: 0.6449, Acc.table: 0.7899, Acc.mountain: 0.6226, Acc.plant: 0.6172, Acc.curtain: 0.8723, Acc.chair: 0.6742, Acc.car: 0.9352, Acc.water: 0.6749, Acc.painting: 0.8830, Acc.sofa: 0.8704, Acc.shelf: 0.5673, Acc.house: 0.4925, Acc.sea: 0.6883, Acc.mirror: 0.7869, Acc.rug: 0.8118, Acc.field: 0.6477, Acc.armchair: 0.6872, Acc.seat: 0.6959, Acc.fence: 0.3376, Acc.desk: 0.6586, Acc.rock: 0.7420, Acc.wardrobe: 0.4233, Acc.lamp: 0.7433, Acc.bathtub: 0.8436, Acc.railing: 0.5921, Acc.cushion: 0.7711, Acc.base: 0.3388, Acc.box: 0.2895, Acc.column: 0.6640, Acc.signboard: 0.4921, Acc.chest of drawers: 0.6729, Acc.counter: 0.4520, Acc.sand: 0.8711, Acc.sink: 0.8188, Acc.skyscraper: 0.7558, Acc.fireplace: 0.8874, Acc.refrigerator: 0.9314, Acc.grandstand: 0.1222, Acc.path: 0.2918, Acc.stairs: 0.5374, Acc.runway: 0.8848, Acc.case: 0.4974, Acc.pool table: 0.9629, Acc.pillow: 0.5878, Acc.screen door: 0.8219, Acc.stairway: 0.7585, Acc.river: 0.4356, Acc.bridge: 0.8359, Acc.bookcase: 0.4166, Acc.blind: 0.3605, Acc.coffee table: 0.8334, Acc.toilet: 0.9210, Acc.flower: 0.5855, Acc.book: 0.7185, Acc.hill: 0.1206, Acc.bench: 0.6451, Acc.countertop: 0.7075, Acc.stove: 0.8625, Acc.palm: 0.7644, Acc.kitchen island: 0.9117, Acc.computer: 0.7472, Acc.swivel chair: 0.6308, Acc.boat: 0.7996, Acc.bar: 0.6165, Acc.arcade machine: 0.6167, Acc.hovel: 0.2193, Acc.bus: 0.9420, Acc.towel: 0.8520, Acc.light: 0.5206, Acc.truck: 0.3958, Acc.tower: 0.1881, Acc.chandelier: 0.8370, Acc.awning: 0.4234, Acc.streetlight: 0.3441, Acc.booth: 0.2755, Acc.television receiver: 0.8618, Acc.airplane: 0.6445, Acc.dirt track: 0.3030, Acc.apparel: 0.6389, Acc.pole: 0.2224, Acc.land: 0.0000, Acc.bannister: 0.0865, Acc.escalator: 0.8125, Acc.ottoman: 0.6583, Acc.bottle: 0.2708, Acc.buffet: 0.6699, Acc.poster: 0.3390, Acc.stage: 0.2047, Acc.van: 0.0997, Acc.ship: 0.0000, Acc.fountain: 0.2162, Acc.conveyer belt: 0.9308, Acc.canopy: 0.4855, Acc.washer: 0.8581, Acc.plaything: 0.6606, Acc.swimming pool: 0.5858, Acc.stool: 0.3834, Acc.barrel: 0.2235, Acc.basket: 0.4062, Acc.waterfall: 0.9381, Acc.tent: 0.0000, Acc.bag: 0.2148, Acc.minibike: 0.8708, Acc.cradle: 0.9842, Acc.oven: 0.6118, Acc.ball: 0.6978, Acc.food: 0.2674, Acc.step: 0.1935, Acc.tank: 0.3210, Acc.trade name: 0.4450, Acc.microwave: 0.8915, Acc.pot: 0.6016, Acc.animal: 0.6534, Acc.bicycle: 0.7733, Acc.lake: 0.0000, Acc.dishwasher: 0.7938, Acc.screen: 0.5837, Acc.blanket: 0.1704, Acc.sculpture: 0.5218, Acc.hood: 0.6698, Acc.sconce: 0.4937, Acc.vase: 0.5647, Acc.traffic light: 0.3994, Acc.tray: 0.3008, Acc.ashcan: 0.5755, Acc.fan: 0.5969, Acc.pier: 0.3283, Acc.crt screen: 0.3094, Acc.plate: 0.6921, Acc.monitor: 0.0335, Acc.bulletin board: 0.4577, Acc.shower: 0.0000, Acc.radiator: 0.7002, Acc.glass: 0.1693, Acc.clock: 0.4463, Acc.flag: 0.7289 2023-11-09 23:26:14,065 - mmseg - INFO - Iter [2050/5000] lr: 1.912e-06, eta: 1:09:04, time: 2.304, data_time: 1.086, memory: 38534, decode.loss_ce: 0.2618, decode.acc_seg: 90.1254, loss: 0.2618 2023-11-09 23:27:17,510 - mmseg - INFO - Iter [2100/5000] lr: 1.880e-06, eta: 1:07:44, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2601, decode.acc_seg: 90.3283, loss: 0.2601 2023-11-09 23:28:20,961 - mmseg - INFO - Iter [2150/5000] lr: 1.847e-06, eta: 1:06:25, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2471, decode.acc_seg: 90.5986, loss: 0.2471 2023-11-09 23:29:22,193 - mmseg - INFO - Iter [2200/5000] lr: 1.815e-06, eta: 1:05:04, time: 1.225, data_time: 0.007, memory: 38534, decode.loss_ce: 0.2362, decode.acc_seg: 91.1553, loss: 0.2362 2023-11-09 23:30:25,808 - mmseg - INFO - Iter [2250/5000] lr: 1.783e-06, eta: 1:03:47, time: 1.272, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2422, decode.acc_seg: 90.7167, loss: 0.2422 2023-11-09 23:31:29,457 - mmseg - INFO - Iter [2300/5000] lr: 1.750e-06, eta: 1:02:30, time: 1.273, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2298, decode.acc_seg: 91.4812, loss: 0.2298 2023-11-09 23:32:30,666 - mmseg - INFO - Iter [2350/5000] lr: 1.718e-06, eta: 1:01:12, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2345, decode.acc_seg: 91.3302, loss: 0.2345 2023-11-09 23:33:34,164 - mmseg - INFO - Iter [2400/5000] lr: 1.685e-06, eta: 0:59:56, time: 1.270, data_time: 0.051, memory: 38534, decode.loss_ce: 0.2363, decode.acc_seg: 90.7356, loss: 0.2363 2023-11-09 23:34:37,644 - mmseg - INFO - Iter [2450/5000] lr: 1.653e-06, eta: 0:58:41, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2268, decode.acc_seg: 91.3783, loss: 0.2268 2023-11-09 23:35:38,848 - mmseg - INFO - Iter [2500/5000] lr: 1.621e-06, eta: 0:57:24, time: 1.224, data_time: 0.007, memory: 38534, decode.loss_ce: 0.2271, decode.acc_seg: 91.2350, loss: 0.2271 2023-11-09 23:36:42,354 - mmseg - INFO - Iter [2550/5000] lr: 1.588e-06, eta: 0:56:10, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2304, decode.acc_seg: 91.0821, loss: 0.2304 2023-11-09 23:37:43,577 - mmseg - INFO - Iter [2600/5000] lr: 1.556e-06, eta: 0:54:54, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2159, decode.acc_seg: 91.6075, loss: 0.2159 2023-11-09 23:38:47,118 - mmseg - INFO - Iter [2650/5000] lr: 1.523e-06, eta: 0:53:41, time: 1.271, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2273, decode.acc_seg: 91.4158, loss: 0.2273 2023-11-09 23:39:50,639 - mmseg - INFO - Iter [2700/5000] lr: 1.491e-06, eta: 0:52:28, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2124, decode.acc_seg: 91.7559, loss: 0.2124 2023-11-09 23:40:51,911 - mmseg - INFO - Iter [2750/5000] lr: 1.459e-06, eta: 0:51:14, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2102, decode.acc_seg: 91.7584, loss: 0.2102 2023-11-09 23:41:55,434 - mmseg - INFO - Iter [2800/5000] lr: 1.426e-06, eta: 0:50:02, time: 1.270, data_time: 0.054, memory: 38534, decode.loss_ce: 0.1992, decode.acc_seg: 92.1451, loss: 0.1992 2023-11-09 23:42:59,091 - mmseg - INFO - Iter [2850/5000] lr: 1.394e-06, eta: 0:48:50, time: 1.273, data_time: 0.056, memory: 38534, decode.loss_ce: 0.2105, decode.acc_seg: 91.7742, loss: 0.2105 2023-11-09 23:44:00,312 - mmseg - INFO - Iter [2900/5000] lr: 1.361e-06, eta: 0:47:37, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1973, decode.acc_seg: 92.2418, loss: 0.1973 2023-11-09 23:45:03,839 - mmseg - INFO - Iter [2950/5000] lr: 1.329e-06, eta: 0:46:26, time: 1.271, data_time: 0.053, memory: 38534, decode.loss_ce: 0.2211, decode.acc_seg: 91.5542, loss: 0.2211 2023-11-09 23:46:05,055 - mmseg - INFO - Saving checkpoint at 3000 iterations 2023-11-09 23:47:00,366 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-09 23:47:00,366 - mmseg - INFO - Iter [3000/5000] lr: 1.297e-06, eta: 0:45:50, time: 2.331, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2134, decode.acc_seg: 91.7209, loss: 0.2134 2023-11-09 23:47:54,446 - mmseg - INFO - per class results: 2023-11-09 23:47:54,451 - mmseg - INFO - +---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 77.77 | 88.6 | | building | 81.68 | 91.84 | | sky | 93.44 | 96.68 | | floor | 81.56 | 90.0 | | tree | 73.73 | 89.27 | | ceiling | 83.61 | 91.82 | | road | 82.56 | 90.12 | | bed | 89.86 | 95.55 | | windowpane | 61.82 | 78.83 | | grass | 62.86 | 79.61 | | cabinet | 61.24 | 74.94 | | sidewalk | 63.65 | 78.74 | | person | 79.93 | 93.71 | | earth | 35.31 | 50.13 | | door | 53.57 | 67.87 | | table | 62.21 | 76.55 | | mountain | 53.2 | 62.72 | | plant | 49.51 | 59.38 | | curtain | 73.07 | 87.89 | | chair | 52.97 | 64.93 | | car | 80.31 | 93.88 | | water | 48.36 | 64.55 | | painting | 75.1 | 86.32 | | sofa | 68.1 | 89.98 | | shelf | 31.59 | 48.13 | | house | 27.26 | 36.89 | | sea | 51.83 | 69.43 | | mirror | 69.25 | 74.29 | | rug | 65.48 | 74.5 | | field | 33.01 | 62.94 | | armchair | 44.93 | 66.16 | | seat | 47.34 | 66.61 | | fence | 30.07 | 38.84 | | desk | 43.81 | 65.4 | | rock | 53.83 | 75.83 | | wardrobe | 34.81 | 45.52 | | lamp | 61.68 | 76.59 | | bathtub | 79.23 | 85.27 | | railing | 37.33 | 51.95 | | cushion | 58.63 | 70.32 | | base | 25.08 | 39.38 | | box | 28.92 | 42.31 | | column | 48.96 | 63.12 | | signboard | 30.96 | 41.68 | | chest of drawers | 39.07 | 63.91 | | counter | 29.4 | 37.98 | | sand | 56.94 | 86.23 | | sink | 73.81 | 81.87 | | skyscraper | 46.49 | 61.11 | | fireplace | 70.41 | 84.3 | | refrigerator | 75.25 | 85.6 | | grandstand | 8.34 | 10.72 | | path | 16.01 | 24.66 | | stairs | 35.4 | 47.85 | | runway | 76.96 | 89.27 | | case | 35.11 | 48.56 | | pool table | 91.48 | 96.69 | | pillow | 59.91 | 71.45 | | screen door | 61.6 | 62.84 | | stairway | 49.8 | 72.54 | | river | 17.83 | 52.76 | | bridge | 65.21 | 84.71 | | bookcase | 30.79 | 53.15 | | blind | 15.71 | 17.56 | | coffee table | 58.71 | 86.19 | | toilet | 86.0 | 90.8 | | flower | 35.29 | 54.21 | | book | 40.81 | 70.62 | | hill | 7.46 | 8.03 | | bench | 50.47 | 59.82 | | countertop | 56.67 | 71.37 | | stove | 71.44 | 86.4 | | palm | 49.96 | 73.35 | | kitchen island | 40.5 | 72.34 | | computer | 65.93 | 76.36 | | swivel chair | 38.84 | 62.17 | | boat | 68.02 | 84.88 | | bar | 32.57 | 49.53 | | arcade machine | 58.67 | 64.29 | | hovel | 16.84 | 21.97 | | bus | 90.01 | 94.45 | | towel | 71.8 | 80.66 | | light | 43.06 | 54.06 | | truck | 38.33 | 47.25 | | tower | 9.81 | 17.37 | | chandelier | 63.64 | 77.93 | | awning | 26.71 | 34.26 | | streetlight | 25.05 | 36.24 | | booth | 15.39 | 16.85 | | television receiver | 73.8 | 85.04 | | airplane | 58.66 | 66.79 | | dirt track | 12.27 | 33.07 | | apparel | 41.22 | 64.89 | | pole | 17.91 | 22.9 | | land | 0.02 | 0.03 | | bannister | 9.33 | 13.8 | | escalator | 57.35 | 77.99 | | ottoman | 47.16 | 61.93 | | bottle | 23.23 | 31.1 | | buffet | 38.94 | 53.9 | | poster | 27.84 | 31.34 | | stage | 9.4 | 19.53 | | van | 8.67 | 10.14 | | ship | 0.0 | 0.0 | | fountain | 12.2 | 12.37 | | conveyer belt | 77.34 | 95.08 | | canopy | 41.83 | 49.29 | | washer | 79.24 | 81.36 | | plaything | 31.37 | 39.3 | | swimming pool | 55.04 | 55.38 | | stool | 37.72 | 48.18 | | barrel | 26.38 | 27.1 | | basket | 36.26 | 52.61 | | waterfall | 46.41 | 72.67 | | tent | 0.0 | 0.0 | | bag | 18.86 | 20.9 | | minibike | 70.91 | 84.71 | | cradle | 77.38 | 96.25 | | oven | 35.89 | 44.97 | | ball | 37.57 | 68.99 | | food | 19.96 | 20.37 | | step | 11.48 | 17.34 | | tank | 29.74 | 31.99 | | trade name | 30.73 | 45.18 | | microwave | 78.98 | 86.7 | | pot | 50.02 | 56.43 | | animal | 62.94 | 64.83 | | bicycle | 59.55 | 79.24 | | lake | 0.0 | 0.0 | | dishwasher | 65.65 | 77.01 | | screen | 40.61 | 43.76 | | blanket | 14.87 | 17.11 | | sculpture | 43.83 | 47.96 | | hood | 53.91 | 63.84 | | sconce | 34.28 | 45.16 | | vase | 36.27 | 51.52 | | traffic light | 28.85 | 42.82 | | tray | 11.1 | 31.72 | | ashcan | 46.83 | 57.83 | | fan | 52.13 | 60.4 | | pier | 29.45 | 31.33 | | crt screen | 16.87 | 53.46 | | plate | 55.55 | 74.13 | | monitor | 3.22 | 3.99 | | bulletin board | 46.87 | 72.94 | | shower | 0.0 | 0.0 | | radiator | 64.9 | 69.13 | | glass | 19.11 | 20.56 | | clock | 35.48 | 39.82 | | flag | 66.22 | 70.11 | +---------------------+-------+-------+ 2023-11-09 23:47:54,452 - mmseg - INFO - Summary: 2023-11-09 23:47:54,452 - mmseg - INFO - +-------+-------+-------+ | aAcc | mIoU | mAcc | +-------+-------+-------+ | 82.18 | 45.91 | 57.75 | +-------+-------+-------+ 2023-11-09 23:47:54,452 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-09 23:47:54,453 - mmseg - INFO - Iter(val) [250] aAcc: 0.8218, mIoU: 0.4591, mAcc: 0.5775, IoU.wall: 0.7777, IoU.building: 0.8168, IoU.sky: 0.9344, IoU.floor: 0.8156, IoU.tree: 0.7373, IoU.ceiling: 0.8361, IoU.road: 0.8256, IoU.bed : 0.8986, IoU.windowpane: 0.6182, IoU.grass: 0.6286, IoU.cabinet: 0.6124, IoU.sidewalk: 0.6365, IoU.person: 0.7993, IoU.earth: 0.3531, IoU.door: 0.5357, IoU.table: 0.6221, IoU.mountain: 0.5320, IoU.plant: 0.4951, IoU.curtain: 0.7307, IoU.chair: 0.5297, IoU.car: 0.8031, IoU.water: 0.4836, IoU.painting: 0.7510, IoU.sofa: 0.6810, IoU.shelf: 0.3159, IoU.house: 0.2726, IoU.sea: 0.5183, IoU.mirror: 0.6925, IoU.rug: 0.6548, IoU.field: 0.3301, IoU.armchair: 0.4493, IoU.seat: 0.4734, IoU.fence: 0.3007, IoU.desk: 0.4381, IoU.rock: 0.5383, IoU.wardrobe: 0.3481, IoU.lamp: 0.6168, IoU.bathtub: 0.7923, IoU.railing: 0.3733, IoU.cushion: 0.5863, IoU.base: 0.2508, IoU.box: 0.2892, IoU.column: 0.4896, IoU.signboard: 0.3096, IoU.chest of drawers: 0.3907, IoU.counter: 0.2940, IoU.sand: 0.5694, IoU.sink: 0.7381, IoU.skyscraper: 0.4649, IoU.fireplace: 0.7041, IoU.refrigerator: 0.7525, IoU.grandstand: 0.0834, IoU.path: 0.1601, IoU.stairs: 0.3540, IoU.runway: 0.7696, IoU.case: 0.3511, IoU.pool table: 0.9148, IoU.pillow: 0.5991, IoU.screen door: 0.6160, IoU.stairway: 0.4980, IoU.river: 0.1783, IoU.bridge: 0.6521, IoU.bookcase: 0.3079, IoU.blind: 0.1571, IoU.coffee table: 0.5871, IoU.toilet: 0.8600, IoU.flower: 0.3529, IoU.book: 0.4081, IoU.hill: 0.0746, IoU.bench: 0.5047, IoU.countertop: 0.5667, IoU.stove: 0.7144, IoU.palm: 0.4996, IoU.kitchen island: 0.4050, IoU.computer: 0.6593, IoU.swivel chair: 0.3884, IoU.boat: 0.6802, IoU.bar: 0.3257, IoU.arcade machine: 0.5867, IoU.hovel: 0.1684, IoU.bus: 0.9001, IoU.towel: 0.7180, IoU.light: 0.4306, IoU.truck: 0.3833, IoU.tower: 0.0981, IoU.chandelier: 0.6364, IoU.awning: 0.2671, IoU.streetlight: 0.2505, IoU.booth: 0.1539, IoU.television receiver: 0.7380, IoU.airplane: 0.5866, IoU.dirt track: 0.1227, IoU.apparel: 0.4122, IoU.pole: 0.1791, IoU.land: 0.0002, IoU.bannister: 0.0933, IoU.escalator: 0.5735, IoU.ottoman: 0.4716, IoU.bottle: 0.2323, IoU.buffet: 0.3894, IoU.poster: 0.2784, IoU.stage: 0.0940, IoU.van: 0.0867, IoU.ship: 0.0000, IoU.fountain: 0.1220, IoU.conveyer belt: 0.7734, IoU.canopy: 0.4183, IoU.washer: 0.7924, IoU.plaything: 0.3137, IoU.swimming pool: 0.5504, IoU.stool: 0.3772, IoU.barrel: 0.2638, IoU.basket: 0.3626, IoU.waterfall: 0.4641, IoU.tent: 0.0000, IoU.bag: 0.1886, IoU.minibike: 0.7091, IoU.cradle: 0.7738, IoU.oven: 0.3589, IoU.ball: 0.3757, IoU.food: 0.1996, IoU.step: 0.1148, IoU.tank: 0.2974, IoU.trade name: 0.3073, IoU.microwave: 0.7898, IoU.pot: 0.5002, IoU.animal: 0.6294, IoU.bicycle: 0.5955, IoU.lake: 0.0000, IoU.dishwasher: 0.6565, IoU.screen: 0.4061, IoU.blanket: 0.1487, IoU.sculpture: 0.4383, IoU.hood: 0.5391, IoU.sconce: 0.3428, IoU.vase: 0.3627, IoU.traffic light: 0.2885, IoU.tray: 0.1110, IoU.ashcan: 0.4683, IoU.fan: 0.5213, IoU.pier: 0.2945, IoU.crt screen: 0.1687, IoU.plate: 0.5555, IoU.monitor: 0.0322, IoU.bulletin board: 0.4687, IoU.shower: 0.0000, IoU.radiator: 0.6490, IoU.glass: 0.1911, IoU.clock: 0.3548, IoU.flag: 0.6622, Acc.wall: 0.8860, Acc.building: 0.9184, Acc.sky: 0.9668, Acc.floor: 0.9000, Acc.tree: 0.8927, Acc.ceiling: 0.9182, Acc.road: 0.9012, Acc.bed : 0.9555, Acc.windowpane: 0.7883, Acc.grass: 0.7961, Acc.cabinet: 0.7494, Acc.sidewalk: 0.7874, Acc.person: 0.9371, Acc.earth: 0.5013, Acc.door: 0.6787, Acc.table: 0.7655, Acc.mountain: 0.6272, Acc.plant: 0.5938, Acc.curtain: 0.8789, Acc.chair: 0.6493, Acc.car: 0.9388, Acc.water: 0.6455, Acc.painting: 0.8632, Acc.sofa: 0.8998, Acc.shelf: 0.4813, Acc.house: 0.3689, Acc.sea: 0.6943, Acc.mirror: 0.7429, Acc.rug: 0.7450, Acc.field: 0.6294, Acc.armchair: 0.6616, Acc.seat: 0.6661, Acc.fence: 0.3884, Acc.desk: 0.6540, Acc.rock: 0.7583, Acc.wardrobe: 0.4552, Acc.lamp: 0.7659, Acc.bathtub: 0.8527, Acc.railing: 0.5195, Acc.cushion: 0.7032, Acc.base: 0.3938, Acc.box: 0.4231, Acc.column: 0.6312, Acc.signboard: 0.4168, Acc.chest of drawers: 0.6391, Acc.counter: 0.3798, Acc.sand: 0.8623, Acc.sink: 0.8187, Acc.skyscraper: 0.6111, Acc.fireplace: 0.8430, Acc.refrigerator: 0.8560, Acc.grandstand: 0.1072, Acc.path: 0.2466, Acc.stairs: 0.4785, Acc.runway: 0.8927, Acc.case: 0.4856, Acc.pool table: 0.9669, Acc.pillow: 0.7145, Acc.screen door: 0.6284, Acc.stairway: 0.7254, Acc.river: 0.5276, Acc.bridge: 0.8471, Acc.bookcase: 0.5315, Acc.blind: 0.1756, Acc.coffee table: 0.8619, Acc.toilet: 0.9080, Acc.flower: 0.5421, Acc.book: 0.7062, Acc.hill: 0.0803, Acc.bench: 0.5982, Acc.countertop: 0.7137, Acc.stove: 0.8640, Acc.palm: 0.7335, Acc.kitchen island: 0.7234, Acc.computer: 0.7636, Acc.swivel chair: 0.6217, Acc.boat: 0.8488, Acc.bar: 0.4953, Acc.arcade machine: 0.6429, Acc.hovel: 0.2197, Acc.bus: 0.9445, Acc.towel: 0.8066, Acc.light: 0.5406, Acc.truck: 0.4725, Acc.tower: 0.1737, Acc.chandelier: 0.7793, Acc.awning: 0.3426, Acc.streetlight: 0.3624, Acc.booth: 0.1685, Acc.television receiver: 0.8504, Acc.airplane: 0.6679, Acc.dirt track: 0.3307, Acc.apparel: 0.6489, Acc.pole: 0.2290, Acc.land: 0.0003, Acc.bannister: 0.1380, Acc.escalator: 0.7799, Acc.ottoman: 0.6193, Acc.bottle: 0.3110, Acc.buffet: 0.5390, Acc.poster: 0.3134, Acc.stage: 0.1953, Acc.van: 0.1014, Acc.ship: 0.0000, Acc.fountain: 0.1237, Acc.conveyer belt: 0.9508, Acc.canopy: 0.4929, Acc.washer: 0.8136, Acc.plaything: 0.3930, Acc.swimming pool: 0.5538, Acc.stool: 0.4818, Acc.barrel: 0.2710, Acc.basket: 0.5261, Acc.waterfall: 0.7267, Acc.tent: 0.0000, Acc.bag: 0.2090, Acc.minibike: 0.8471, Acc.cradle: 0.9625, Acc.oven: 0.4497, Acc.ball: 0.6899, Acc.food: 0.2037, Acc.step: 0.1734, Acc.tank: 0.3199, Acc.trade name: 0.4518, Acc.microwave: 0.8670, Acc.pot: 0.5643, Acc.animal: 0.6483, Acc.bicycle: 0.7924, Acc.lake: 0.0000, Acc.dishwasher: 0.7701, Acc.screen: 0.4376, Acc.blanket: 0.1711, Acc.sculpture: 0.4796, Acc.hood: 0.6384, Acc.sconce: 0.4516, Acc.vase: 0.5152, Acc.traffic light: 0.4282, Acc.tray: 0.3172, Acc.ashcan: 0.5783, Acc.fan: 0.6040, Acc.pier: 0.3133, Acc.crt screen: 0.5346, Acc.plate: 0.7413, Acc.monitor: 0.0399, Acc.bulletin board: 0.7294, Acc.shower: 0.0000, Acc.radiator: 0.6913, Acc.glass: 0.2056, Acc.clock: 0.3982, Acc.flag: 0.7011 2023-11-09 23:48:57,960 - mmseg - INFO - Iter [3050/5000] lr: 1.264e-06, eta: 0:45:13, time: 2.352, data_time: 1.135, memory: 38534, decode.loss_ce: 0.2142, decode.acc_seg: 91.8498, loss: 0.2142 2023-11-09 23:50:01,339 - mmseg - INFO - Iter [3100/5000] lr: 1.232e-06, eta: 0:43:59, time: 1.268, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1957, decode.acc_seg: 92.3266, loss: 0.1957 2023-11-09 23:51:02,494 - mmseg - INFO - Iter [3150/5000] lr: 1.199e-06, eta: 0:42:45, time: 1.223, data_time: 0.007, memory: 38534, decode.loss_ce: 0.1951, decode.acc_seg: 92.3087, loss: 0.1951 2023-11-09 23:52:05,848 - mmseg - INFO - Iter [3200/5000] lr: 1.167e-06, eta: 0:41:32, time: 1.267, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1983, decode.acc_seg: 92.1063, loss: 0.1983 2023-11-09 23:53:09,286 - mmseg - INFO - Iter [3250/5000] lr: 1.135e-06, eta: 0:40:20, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2071, decode.acc_seg: 92.0398, loss: 0.2071 2023-11-09 23:54:10,445 - mmseg - INFO - Iter [3300/5000] lr: 1.102e-06, eta: 0:39:07, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.2000, decode.acc_seg: 92.1843, loss: 0.2000 2023-11-09 23:55:13,892 - mmseg - INFO - Iter [3350/5000] lr: 1.070e-06, eta: 0:37:55, time: 1.269, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1936, decode.acc_seg: 92.3446, loss: 0.1936 2023-11-09 23:56:17,393 - mmseg - INFO - Iter [3400/5000] lr: 1.037e-06, eta: 0:36:43, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.2068, decode.acc_seg: 91.9792, loss: 0.2068 2023-11-09 23:57:18,556 - mmseg - INFO - Iter [3450/5000] lr: 1.005e-06, eta: 0:35:31, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1995, decode.acc_seg: 92.1372, loss: 0.1995 2023-11-09 23:58:22,025 - mmseg - INFO - Iter [3500/5000] lr: 9.726e-07, eta: 0:34:20, time: 1.269, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1895, decode.acc_seg: 92.4622, loss: 0.1895 2023-11-09 23:59:23,222 - mmseg - INFO - Iter [3550/5000] lr: 9.402e-07, eta: 0:33:08, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1790, decode.acc_seg: 93.0010, loss: 0.1790 2023-11-10 00:00:26,655 - mmseg - INFO - Iter [3600/5000] lr: 9.078e-07, eta: 0:31:58, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1914, decode.acc_seg: 92.4182, loss: 0.1914 2023-11-10 00:01:30,191 - mmseg - INFO - Iter [3650/5000] lr: 8.754e-07, eta: 0:30:47, time: 1.271, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1913, decode.acc_seg: 92.4978, loss: 0.1913 2023-11-10 00:02:31,392 - mmseg - INFO - Iter [3700/5000] lr: 8.430e-07, eta: 0:29:36, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1807, decode.acc_seg: 92.6843, loss: 0.1807 2023-11-10 00:03:34,834 - mmseg - INFO - Iter [3750/5000] lr: 8.106e-07, eta: 0:28:26, time: 1.269, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1805, decode.acc_seg: 92.7099, loss: 0.1805 2023-11-10 00:04:38,226 - mmseg - INFO - Iter [3800/5000] lr: 7.782e-07, eta: 0:27:17, time: 1.268, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1872, decode.acc_seg: 92.5187, loss: 0.1872 2023-11-10 00:05:39,417 - mmseg - INFO - Iter [3850/5000] lr: 7.458e-07, eta: 0:26:06, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1864, decode.acc_seg: 92.6280, loss: 0.1864 2023-11-10 00:06:42,957 - mmseg - INFO - Iter [3900/5000] lr: 7.134e-07, eta: 0:24:57, time: 1.271, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1801, decode.acc_seg: 92.8809, loss: 0.1801 2023-11-10 00:07:44,133 - mmseg - INFO - Iter [3950/5000] lr: 6.810e-07, eta: 0:23:47, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1828, decode.acc_seg: 92.8699, loss: 0.1828 2023-11-10 00:08:47,704 - mmseg - INFO - Saving checkpoint at 4000 iterations 2023-11-10 00:09:38,103 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-10 00:09:38,103 - mmseg - INFO - Iter [4000/5000] lr: 6.486e-07, eta: 0:22:50, time: 2.279, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1766, decode.acc_seg: 92.9688, loss: 0.1766 2023-11-10 00:10:33,057 - mmseg - INFO - per class results: 2023-11-10 00:10:33,062 - mmseg - INFO - +---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 77.45 | 88.71 | | building | 81.27 | 90.93 | | sky | 93.36 | 97.4 | | floor | 81.75 | 91.13 | | tree | 73.59 | 86.66 | | ceiling | 83.5 | 93.01 | | road | 83.13 | 89.93 | | bed | 89.97 | 95.3 | | windowpane | 62.45 | 78.57 | | grass | 61.33 | 77.26 | | cabinet | 61.96 | 74.23 | | sidewalk | 63.49 | 81.13 | | person | 80.01 | 93.67 | | earth | 35.02 | 46.42 | | door | 52.38 | 63.18 | | table | 62.61 | 76.96 | | mountain | 53.82 | 65.1 | | plant | 48.25 | 57.76 | | curtain | 72.5 | 88.6 | | chair | 55.31 | 71.15 | | car | 81.12 | 93.59 | | water | 49.77 | 65.16 | | painting | 75.11 | 86.38 | | sofa | 68.56 | 90.27 | | shelf | 33.24 | 52.01 | | house | 24.96 | 31.92 | | sea | 53.58 | 71.11 | | mirror | 70.15 | 76.16 | | rug | 61.82 | 67.37 | | field | 31.16 | 66.51 | | armchair | 45.73 | 59.33 | | seat | 48.83 | 69.32 | | fence | 30.64 | 43.0 | | desk | 45.6 | 60.51 | | rock | 52.36 | 76.55 | | wardrobe | 36.31 | 51.05 | | lamp | 62.26 | 76.05 | | bathtub | 80.02 | 84.35 | | railing | 37.46 | 52.75 | | cushion | 59.03 | 76.3 | | base | 25.44 | 40.13 | | box | 27.13 | 32.07 | | column | 50.19 | 63.78 | | signboard | 33.4 | 48.37 | | chest of drawers | 38.12 | 66.42 | | counter | 33.62 | 45.94 | | sand | 54.97 | 85.04 | | sink | 74.52 | 80.87 | | skyscraper | 46.03 | 70.08 | | fireplace | 73.08 | 87.4 | | refrigerator | 74.05 | 87.74 | | grandstand | 8.33 | 8.98 | | path | 12.85 | 20.32 | | stairs | 45.08 | 65.11 | | runway | 76.81 | 89.28 | | case | 37.18 | 48.58 | | pool table | 92.37 | 96.61 | | pillow | 55.96 | 63.78 | | screen door | 75.46 | 78.19 | | stairway | 43.79 | 71.51 | | river | 18.47 | 53.0 | | bridge | 73.54 | 83.45 | | bookcase | 28.68 | 49.96 | | blind | 24.63 | 28.97 | | coffee table | 59.41 | 85.04 | | toilet | 86.52 | 91.36 | | flower | 34.31 | 54.98 | | book | 41.76 | 70.83 | | hill | 7.77 | 9.88 | | bench | 51.28 | 59.8 | | countertop | 57.96 | 71.5 | | stove | 71.89 | 85.02 | | palm | 49.06 | 79.63 | | kitchen island | 47.68 | 84.91 | | computer | 67.68 | 76.44 | | swivel chair | 33.55 | 46.77 | | boat | 63.53 | 78.94 | | bar | 34.52 | 49.75 | | arcade machine | 41.42 | 43.5 | | hovel | 18.22 | 22.55 | | bus | 90.15 | 94.93 | | towel | 73.49 | 83.15 | | light | 37.88 | 44.41 | | truck | 37.43 | 47.52 | | tower | 10.26 | 17.93 | | chandelier | 61.96 | 72.5 | | awning | 28.49 | 38.33 | | streetlight | 26.63 | 37.57 | | booth | 20.15 | 23.04 | | television receiver | 74.61 | 84.56 | | airplane | 58.85 | 65.07 | | dirt track | 12.42 | 30.47 | | apparel | 46.15 | 64.82 | | pole | 19.82 | 26.33 | | land | 0.0 | 0.0 | | bannister | 7.49 | 10.53 | | escalator | 62.82 | 82.55 | | ottoman | 46.66 | 65.06 | | bottle | 22.1 | 30.55 | | buffet | 41.21 | 55.65 | | poster | 30.32 | 34.98 | | stage | 8.96 | 17.82 | | van | 11.13 | 13.58 | | ship | 0.0 | 0.0 | | fountain | 18.37 | 18.91 | | conveyer belt | 79.25 | 93.23 | | canopy | 39.5 | 50.69 | | washer | 79.69 | 81.77 | | plaything | 30.64 | 37.64 | | swimming pool | 50.82 | 50.9 | | stool | 35.16 | 42.88 | | barrel | 28.56 | 29.86 | | basket | 36.39 | 51.72 | | waterfall | 45.92 | 76.42 | | tent | 0.0 | 0.0 | | bag | 25.25 | 29.72 | | minibike | 71.0 | 85.94 | | cradle | 77.78 | 95.09 | | oven | 41.53 | 51.98 | | ball | 37.82 | 69.8 | | food | 20.34 | 20.88 | | step | 13.5 | 17.46 | | tank | 28.12 | 31.19 | | trade name | 31.1 | 44.2 | | microwave | 79.88 | 89.54 | | pot | 52.42 | 61.94 | | animal | 61.25 | 62.63 | | bicycle | 59.61 | 79.0 | | lake | 0.0 | 0.0 | | dishwasher | 65.31 | 79.98 | | screen | 48.16 | 56.99 | | blanket | 17.35 | 20.43 | | sculpture | 41.84 | 48.6 | | hood | 54.88 | 64.83 | | sconce | 33.6 | 41.13 | | vase | 35.95 | 58.29 | | traffic light | 28.92 | 48.16 | | tray | 9.9 | 31.21 | | ashcan | 47.1 | 61.3 | | fan | 52.02 | 61.05 | | pier | 32.89 | 35.35 | | crt screen | 12.4 | 35.8 | | plate | 53.31 | 77.19 | | monitor | 1.83 | 2.24 | | bulletin board | 45.56 | 68.04 | | shower | 0.02 | 0.05 | | radiator | 64.88 | 69.58 | | glass | 19.83 | 22.03 | | clock | 36.74 | 44.05 | | flag | 66.59 | 73.71 | +---------------------+-------+-------+ 2023-11-10 00:10:33,063 - mmseg - INFO - Summary: 2023-11-10 00:10:33,064 - mmseg - INFO - +-------+-------+-------+ | aAcc | mIoU | mAcc | +-------+-------+-------+ | 82.22 | 46.35 | 58.32 | +-------+-------+-------+ 2023-11-10 00:10:33,064 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-10 00:10:33,065 - mmseg - INFO - Iter(val) [250] aAcc: 0.8222, mIoU: 0.4635, mAcc: 0.5832, IoU.wall: 0.7745, IoU.building: 0.8127, IoU.sky: 0.9336, IoU.floor: 0.8175, IoU.tree: 0.7359, IoU.ceiling: 0.8350, IoU.road: 0.8313, IoU.bed : 0.8997, IoU.windowpane: 0.6245, IoU.grass: 0.6133, IoU.cabinet: 0.6196, IoU.sidewalk: 0.6349, IoU.person: 0.8001, IoU.earth: 0.3502, IoU.door: 0.5238, IoU.table: 0.6261, IoU.mountain: 0.5382, IoU.plant: 0.4825, IoU.curtain: 0.7250, IoU.chair: 0.5531, IoU.car: 0.8112, IoU.water: 0.4977, IoU.painting: 0.7511, IoU.sofa: 0.6856, IoU.shelf: 0.3324, IoU.house: 0.2496, IoU.sea: 0.5358, IoU.mirror: 0.7015, IoU.rug: 0.6182, IoU.field: 0.3116, IoU.armchair: 0.4573, IoU.seat: 0.4883, IoU.fence: 0.3064, IoU.desk: 0.4560, IoU.rock: 0.5236, IoU.wardrobe: 0.3631, IoU.lamp: 0.6226, IoU.bathtub: 0.8002, IoU.railing: 0.3746, IoU.cushion: 0.5903, IoU.base: 0.2544, IoU.box: 0.2713, IoU.column: 0.5019, IoU.signboard: 0.3340, IoU.chest of drawers: 0.3812, IoU.counter: 0.3362, IoU.sand: 0.5497, IoU.sink: 0.7452, IoU.skyscraper: 0.4603, IoU.fireplace: 0.7308, IoU.refrigerator: 0.7405, IoU.grandstand: 0.0833, IoU.path: 0.1285, IoU.stairs: 0.4508, IoU.runway: 0.7681, IoU.case: 0.3718, IoU.pool table: 0.9237, IoU.pillow: 0.5596, IoU.screen door: 0.7546, IoU.stairway: 0.4379, IoU.river: 0.1847, IoU.bridge: 0.7354, IoU.bookcase: 0.2868, IoU.blind: 0.2463, IoU.coffee table: 0.5941, IoU.toilet: 0.8652, IoU.flower: 0.3431, IoU.book: 0.4176, IoU.hill: 0.0777, IoU.bench: 0.5128, IoU.countertop: 0.5796, IoU.stove: 0.7189, IoU.palm: 0.4906, IoU.kitchen island: 0.4768, IoU.computer: 0.6768, IoU.swivel chair: 0.3355, IoU.boat: 0.6353, IoU.bar: 0.3452, IoU.arcade machine: 0.4142, IoU.hovel: 0.1822, IoU.bus: 0.9015, IoU.towel: 0.7349, IoU.light: 0.3788, IoU.truck: 0.3743, IoU.tower: 0.1026, IoU.chandelier: 0.6196, IoU.awning: 0.2849, IoU.streetlight: 0.2663, IoU.booth: 0.2015, IoU.television receiver: 0.7461, IoU.airplane: 0.5885, IoU.dirt track: 0.1242, IoU.apparel: 0.4615, IoU.pole: 0.1982, IoU.land: 0.0000, IoU.bannister: 0.0749, IoU.escalator: 0.6282, IoU.ottoman: 0.4666, IoU.bottle: 0.2210, IoU.buffet: 0.4121, IoU.poster: 0.3032, IoU.stage: 0.0896, IoU.van: 0.1113, IoU.ship: 0.0000, IoU.fountain: 0.1837, IoU.conveyer belt: 0.7925, IoU.canopy: 0.3950, IoU.washer: 0.7969, IoU.plaything: 0.3064, IoU.swimming pool: 0.5082, IoU.stool: 0.3516, IoU.barrel: 0.2856, IoU.basket: 0.3639, IoU.waterfall: 0.4592, IoU.tent: 0.0000, IoU.bag: 0.2525, IoU.minibike: 0.7100, IoU.cradle: 0.7778, IoU.oven: 0.4153, IoU.ball: 0.3782, IoU.food: 0.2034, IoU.step: 0.1350, IoU.tank: 0.2812, IoU.trade name: 0.3110, IoU.microwave: 0.7988, IoU.pot: 0.5242, IoU.animal: 0.6125, IoU.bicycle: 0.5961, IoU.lake: 0.0000, IoU.dishwasher: 0.6531, IoU.screen: 0.4816, IoU.blanket: 0.1735, IoU.sculpture: 0.4184, IoU.hood: 0.5488, IoU.sconce: 0.3360, IoU.vase: 0.3595, IoU.traffic light: 0.2892, IoU.tray: 0.0990, IoU.ashcan: 0.4710, IoU.fan: 0.5202, IoU.pier: 0.3289, IoU.crt screen: 0.1240, IoU.plate: 0.5331, IoU.monitor: 0.0183, IoU.bulletin board: 0.4556, IoU.shower: 0.0002, IoU.radiator: 0.6488, IoU.glass: 0.1983, IoU.clock: 0.3674, IoU.flag: 0.6659, Acc.wall: 0.8871, Acc.building: 0.9093, Acc.sky: 0.9740, Acc.floor: 0.9113, Acc.tree: 0.8666, Acc.ceiling: 0.9301, Acc.road: 0.8993, Acc.bed : 0.9530, Acc.windowpane: 0.7857, Acc.grass: 0.7726, Acc.cabinet: 0.7423, Acc.sidewalk: 0.8113, Acc.person: 0.9367, Acc.earth: 0.4642, Acc.door: 0.6318, Acc.table: 0.7696, Acc.mountain: 0.6510, Acc.plant: 0.5776, Acc.curtain: 0.8860, Acc.chair: 0.7115, Acc.car: 0.9359, Acc.water: 0.6516, Acc.painting: 0.8638, Acc.sofa: 0.9027, Acc.shelf: 0.5201, Acc.house: 0.3192, Acc.sea: 0.7111, Acc.mirror: 0.7616, Acc.rug: 0.6737, Acc.field: 0.6651, Acc.armchair: 0.5933, Acc.seat: 0.6932, Acc.fence: 0.4300, Acc.desk: 0.6051, Acc.rock: 0.7655, Acc.wardrobe: 0.5105, Acc.lamp: 0.7605, Acc.bathtub: 0.8435, Acc.railing: 0.5275, Acc.cushion: 0.7630, Acc.base: 0.4013, Acc.box: 0.3207, Acc.column: 0.6378, Acc.signboard: 0.4837, Acc.chest of drawers: 0.6642, Acc.counter: 0.4594, Acc.sand: 0.8504, Acc.sink: 0.8087, Acc.skyscraper: 0.7008, Acc.fireplace: 0.8740, Acc.refrigerator: 0.8774, Acc.grandstand: 0.0898, Acc.path: 0.2032, Acc.stairs: 0.6511, Acc.runway: 0.8928, Acc.case: 0.4858, Acc.pool table: 0.9661, Acc.pillow: 0.6378, Acc.screen door: 0.7819, Acc.stairway: 0.7151, Acc.river: 0.5300, Acc.bridge: 0.8345, Acc.bookcase: 0.4996, Acc.blind: 0.2897, Acc.coffee table: 0.8504, Acc.toilet: 0.9136, Acc.flower: 0.5498, Acc.book: 0.7083, Acc.hill: 0.0988, Acc.bench: 0.5980, Acc.countertop: 0.7150, Acc.stove: 0.8502, Acc.palm: 0.7963, Acc.kitchen island: 0.8491, Acc.computer: 0.7644, Acc.swivel chair: 0.4677, Acc.boat: 0.7894, Acc.bar: 0.4975, Acc.arcade machine: 0.4350, Acc.hovel: 0.2255, Acc.bus: 0.9493, Acc.towel: 0.8315, Acc.light: 0.4441, Acc.truck: 0.4752, Acc.tower: 0.1793, Acc.chandelier: 0.7250, Acc.awning: 0.3833, Acc.streetlight: 0.3757, Acc.booth: 0.2304, Acc.television receiver: 0.8456, Acc.airplane: 0.6507, Acc.dirt track: 0.3047, Acc.apparel: 0.6482, Acc.pole: 0.2633, Acc.land: 0.0000, Acc.bannister: 0.1053, Acc.escalator: 0.8255, Acc.ottoman: 0.6506, Acc.bottle: 0.3055, Acc.buffet: 0.5565, Acc.poster: 0.3498, Acc.stage: 0.1782, Acc.van: 0.1358, Acc.ship: 0.0000, Acc.fountain: 0.1891, Acc.conveyer belt: 0.9323, Acc.canopy: 0.5069, Acc.washer: 0.8177, Acc.plaything: 0.3764, Acc.swimming pool: 0.5090, Acc.stool: 0.4288, Acc.barrel: 0.2986, Acc.basket: 0.5172, Acc.waterfall: 0.7642, Acc.tent: 0.0000, Acc.bag: 0.2972, Acc.minibike: 0.8594, Acc.cradle: 0.9509, Acc.oven: 0.5198, Acc.ball: 0.6980, Acc.food: 0.2088, Acc.step: 0.1746, Acc.tank: 0.3119, Acc.trade name: 0.4420, Acc.microwave: 0.8954, Acc.pot: 0.6194, Acc.animal: 0.6263, Acc.bicycle: 0.7900, Acc.lake: 0.0000, Acc.dishwasher: 0.7998, Acc.screen: 0.5699, Acc.blanket: 0.2043, Acc.sculpture: 0.4860, Acc.hood: 0.6483, Acc.sconce: 0.4113, Acc.vase: 0.5829, Acc.traffic light: 0.4816, Acc.tray: 0.3121, Acc.ashcan: 0.6130, Acc.fan: 0.6105, Acc.pier: 0.3535, Acc.crt screen: 0.3580, Acc.plate: 0.7719, Acc.monitor: 0.0224, Acc.bulletin board: 0.6804, Acc.shower: 0.0005, Acc.radiator: 0.6958, Acc.glass: 0.2203, Acc.clock: 0.4405, Acc.flag: 0.7371 2023-11-10 00:11:36,582 - mmseg - INFO - Iter [4050/5000] lr: 6.162e-07, eta: 0:21:54, time: 2.370, data_time: 1.152, memory: 38534, decode.loss_ce: 0.1721, decode.acc_seg: 93.1551, loss: 0.1721 2023-11-10 00:12:37,742 - mmseg - INFO - Iter [4100/5000] lr: 5.838e-07, eta: 0:20:43, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1806, decode.acc_seg: 92.6499, loss: 0.1806 2023-11-10 00:13:41,186 - mmseg - INFO - Iter [4150/5000] lr: 5.514e-07, eta: 0:19:32, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1813, decode.acc_seg: 92.9452, loss: 0.1813 2023-11-10 00:14:44,750 - mmseg - INFO - Iter [4200/5000] lr: 5.190e-07, eta: 0:18:22, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1823, decode.acc_seg: 92.7957, loss: 0.1823 2023-11-10 00:15:45,943 - mmseg - INFO - Iter [4250/5000] lr: 4.866e-07, eta: 0:17:12, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1821, decode.acc_seg: 92.7583, loss: 0.1821 2023-11-10 00:16:49,893 - mmseg - INFO - Iter [4300/5000] lr: 4.542e-07, eta: 0:16:02, time: 1.279, data_time: 0.062, memory: 38534, decode.loss_ce: 0.1938, decode.acc_seg: 92.6474, loss: 0.1938 2023-11-10 00:17:53,345 - mmseg - INFO - Iter [4350/5000] lr: 4.218e-07, eta: 0:14:53, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1887, decode.acc_seg: 92.4755, loss: 0.1887 2023-11-10 00:18:54,490 - mmseg - INFO - Iter [4400/5000] lr: 3.894e-07, eta: 0:13:43, time: 1.223, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1645, decode.acc_seg: 93.3589, loss: 0.1645 2023-11-10 00:19:58,033 - mmseg - INFO - Iter [4450/5000] lr: 3.570e-07, eta: 0:12:34, time: 1.271, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1890, decode.acc_seg: 92.7035, loss: 0.1890 2023-11-10 00:20:59,229 - mmseg - INFO - Iter [4500/5000] lr: 3.246e-07, eta: 0:11:24, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1712, decode.acc_seg: 93.1262, loss: 0.1712 2023-11-10 00:22:02,754 - mmseg - INFO - Iter [4550/5000] lr: 2.922e-07, eta: 0:10:15, time: 1.270, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1757, decode.acc_seg: 92.8165, loss: 0.1757 2023-11-10 00:23:06,228 - mmseg - INFO - Iter [4600/5000] lr: 2.598e-07, eta: 0:09:07, time: 1.269, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1791, decode.acc_seg: 92.7841, loss: 0.1791 2023-11-10 00:24:07,419 - mmseg - INFO - Iter [4650/5000] lr: 2.274e-07, eta: 0:07:58, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1697, decode.acc_seg: 93.1094, loss: 0.1697 2023-11-10 00:25:11,025 - mmseg - INFO - Iter [4700/5000] lr: 1.950e-07, eta: 0:06:49, time: 1.272, data_time: 0.054, memory: 38534, decode.loss_ce: 0.1672, decode.acc_seg: 93.0462, loss: 0.1672 2023-11-10 00:26:14,564 - mmseg - INFO - Iter [4750/5000] lr: 1.626e-07, eta: 0:05:41, time: 1.271, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1750, decode.acc_seg: 92.9852, loss: 0.1750 2023-11-10 00:27:15,778 - mmseg - INFO - Iter [4800/5000] lr: 1.302e-07, eta: 0:04:32, time: 1.224, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1718, decode.acc_seg: 93.0326, loss: 0.1718 2023-11-10 00:28:19,277 - mmseg - INFO - Iter [4850/5000] lr: 9.784e-08, eta: 0:03:24, time: 1.270, data_time: 0.053, memory: 38534, decode.loss_ce: 0.1686, decode.acc_seg: 93.2814, loss: 0.1686 2023-11-10 00:29:22,839 - mmseg - INFO - Iter [4900/5000] lr: 6.544e-08, eta: 0:02:16, time: 1.271, data_time: 0.052, memory: 38534, decode.loss_ce: 0.1676, decode.acc_seg: 93.1667, loss: 0.1676 2023-11-10 00:30:24,077 - mmseg - INFO - Iter [4950/5000] lr: 3.305e-08, eta: 0:01:07, time: 1.225, data_time: 0.008, memory: 38534, decode.loss_ce: 0.1681, decode.acc_seg: 93.2769, loss: 0.1681 2023-11-10 00:31:27,556 - mmseg - INFO - Saving checkpoint at 5000 iterations 2023-11-10 00:32:20,637 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-10 00:32:20,637 - mmseg - INFO - Iter [5000/5000] lr: 6.480e-10, eta: 0:00:00, time: 2.331, data_time: 0.051, memory: 38534, decode.loss_ce: 0.1734, decode.acc_seg: 93.1065, loss: 0.1734 2023-11-10 00:33:14,421 - mmseg - INFO - per class results: 2023-11-10 00:33:14,426 - mmseg - INFO - +---------------------+-------+-------+ | Class | IoU | Acc | +---------------------+-------+-------+ | wall | 77.58 | 88.71 | | building | 81.51 | 91.58 | | sky | 93.45 | 97.31 | | floor | 81.67 | 91.14 | | tree | 73.94 | 88.05 | | ceiling | 83.82 | 92.55 | | road | 83.13 | 89.93 | | bed | 90.1 | 95.52 | | windowpane | 62.5 | 79.58 | | grass | 62.57 | 77.97 | | cabinet | 62.14 | 73.88 | | sidewalk | 63.66 | 81.68 | | person | 80.06 | 93.8 | | earth | 35.86 | 47.42 | | door | 52.69 | 64.44 | | table | 62.49 | 77.9 | | mountain | 52.71 | 62.56 | | plant | 50.51 | 62.13 | | curtain | 73.25 | 87.91 | | chair | 54.96 | 69.28 | | car | 80.55 | 93.84 | | water | 52.22 | 68.48 | | painting | 74.26 | 87.39 | | sofa | 69.98 | 90.1 | | shelf | 33.38 | 51.75 | | house | 24.96 | 32.11 | | sea | 54.03 | 70.35 | | mirror | 70.87 | 76.51 | | rug | 61.76 | 67.89 | | field | 32.0 | 65.0 | | armchair | 48.03 | 64.36 | | seat | 47.88 | 66.74 | | fence | 30.25 | 39.46 | | desk | 44.88 | 64.12 | | rock | 50.89 | 77.01 | | wardrobe | 37.35 | 50.7 | | lamp | 62.48 | 76.67 | | bathtub | 80.59 | 87.13 | | railing | 37.52 | 51.19 | | cushion | 59.99 | 74.96 | | base | 25.27 | 38.19 | | box | 27.8 | 33.58 | | column | 49.8 | 63.55 | | signboard | 32.74 | 48.58 | | chest of drawers | 36.41 | 65.59 | | counter | 29.87 | 39.58 | | sand | 55.32 | 84.79 | | sink | 74.41 | 81.92 | | skyscraper | 48.04 | 70.07 | | fireplace | 71.86 | 87.64 | | refrigerator | 74.29 | 87.56 | | grandstand | 9.32 | 10.32 | | path | 11.53 | 16.09 | | stairs | 43.39 | 57.98 | | runway | 76.91 | 88.81 | | case | 40.93 | 54.37 | | pool table | 93.16 | 96.45 | | pillow | 58.58 | 68.22 | | screen door | 72.04 | 75.97 | | stairway | 46.79 | 69.67 | | river | 17.58 | 46.42 | | bridge | 71.74 | 79.61 | | bookcase | 27.98 | 46.28 | | blind | 25.81 | 30.14 | | coffee table | 60.82 | 82.61 | | toilet | 86.49 | 91.28 | | flower | 35.78 | 55.66 | | book | 42.22 | 72.27 | | hill | 7.58 | 9.07 | | bench | 51.48 | 59.78 | | countertop | 57.01 | 72.62 | | stove | 71.34 | 85.91 | | palm | 49.01 | 77.18 | | kitchen island | 46.25 | 80.04 | | computer | 69.41 | 80.16 | | swivel chair | 35.1 | 50.79 | | boat | 64.18 | 81.13 | | bar | 33.51 | 49.98 | | arcade machine | 42.62 | 44.91 | | hovel | 17.8 | 20.97 | | bus | 90.84 | 94.51 | | towel | 73.62 | 83.5 | | light | 41.07 | 49.4 | | truck | 37.27 | 48.07 | | tower | 9.5 | 16.49 | | chandelier | 63.17 | 76.36 | | awning | 27.54 | 38.26 | | streetlight | 26.44 | 37.12 | | booth | 19.7 | 22.87 | | television receiver | 73.97 | 85.56 | | airplane | 59.07 | 65.28 | | dirt track | 11.78 | 30.86 | | apparel | 46.52 | 67.53 | | pole | 19.21 | 25.06 | | land | 0.0 | 0.0 | | bannister | 7.03 | 9.39 | | escalator | 58.31 | 75.37 | | ottoman | 46.85 | 63.3 | | bottle | 23.31 | 30.93 | | buffet | 43.45 | 59.15 | | poster | 29.3 | 33.47 | | stage | 8.03 | 15.24 | | van | 11.9 | 14.45 | | ship | 0.0 | 0.0 | | fountain | 12.62 | 12.78 | | conveyer belt | 79.09 | 93.3 | | canopy | 39.15 | 50.69 | | washer | 78.19 | 79.53 | | plaything | 30.59 | 38.71 | | swimming pool | 51.83 | 52.22 | | stool | 35.19 | 43.41 | | barrel | 29.82 | 30.6 | | basket | 36.95 | 51.28 | | waterfall | 44.16 | 71.02 | | tent | 0.0 | 0.0 | | bag | 25.81 | 32.01 | | minibike | 71.97 | 84.71 | | cradle | 76.89 | 97.34 | | oven | 42.5 | 52.89 | | ball | 37.89 | 69.97 | | food | 25.48 | 26.32 | | step | 10.14 | 12.88 | | tank | 28.89 | 31.18 | | trade name | 30.26 | 40.99 | | microwave | 80.38 | 89.89 | | pot | 52.96 | 61.42 | | animal | 62.68 | 64.2 | | bicycle | 58.51 | 75.89 | | lake | 0.0 | 0.0 | | dishwasher | 65.73 | 79.72 | | screen | 52.18 | 64.02 | | blanket | 18.31 | 21.73 | | sculpture | 43.49 | 50.44 | | hood | 56.08 | 65.21 | | sconce | 36.73 | 48.86 | | vase | 37.67 | 55.37 | | traffic light | 29.61 | 46.04 | | tray | 12.27 | 29.22 | | ashcan | 46.92 | 61.19 | | fan | 53.0 | 61.59 | | pier | 33.29 | 36.08 | | crt screen | 13.62 | 30.74 | | plate | 53.06 | 77.65 | | monitor | 1.78 | 2.12 | | bulletin board | 44.76 | 68.17 | | shower | 0.13 | 0.32 | | radiator | 64.74 | 68.82 | | glass | 19.89 | 21.89 | | clock | 38.04 | 44.16 | | flag | 65.61 | 71.61 | +---------------------+-------+-------+ 2023-11-10 00:33:14,426 - mmseg - INFO - Summary: 2023-11-10 00:33:14,427 - mmseg - INFO - +-------+-------+-------+ | aAcc | mIoU | mAcc | +-------+-------+-------+ | 82.44 | 46.54 | 58.23 | +-------+-------+-------+ 2023-11-10 00:33:14,427 - mmseg - INFO - Exp name: segmenter_linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py 2023-11-10 00:33:14,428 - mmseg - INFO - Iter(val) [250] aAcc: 0.8244, mIoU: 0.4654, mAcc: 0.5823, IoU.wall: 0.7758, IoU.building: 0.8151, IoU.sky: 0.9345, IoU.floor: 0.8167, IoU.tree: 0.7394, IoU.ceiling: 0.8382, IoU.road: 0.8313, IoU.bed : 0.9010, IoU.windowpane: 0.6250, IoU.grass: 0.6257, IoU.cabinet: 0.6214, IoU.sidewalk: 0.6366, IoU.person: 0.8006, IoU.earth: 0.3586, IoU.door: 0.5269, IoU.table: 0.6249, IoU.mountain: 0.5271, IoU.plant: 0.5051, IoU.curtain: 0.7325, IoU.chair: 0.5496, IoU.car: 0.8055, IoU.water: 0.5222, IoU.painting: 0.7426, IoU.sofa: 0.6998, IoU.shelf: 0.3338, IoU.house: 0.2496, IoU.sea: 0.5403, IoU.mirror: 0.7087, IoU.rug: 0.6176, IoU.field: 0.3200, IoU.armchair: 0.4803, IoU.seat: 0.4788, IoU.fence: 0.3025, IoU.desk: 0.4488, IoU.rock: 0.5089, IoU.wardrobe: 0.3735, IoU.lamp: 0.6248, IoU.bathtub: 0.8059, IoU.railing: 0.3752, IoU.cushion: 0.5999, IoU.base: 0.2527, IoU.box: 0.2780, IoU.column: 0.4980, IoU.signboard: 0.3274, IoU.chest of drawers: 0.3641, IoU.counter: 0.2987, IoU.sand: 0.5532, IoU.sink: 0.7441, IoU.skyscraper: 0.4804, IoU.fireplace: 0.7186, IoU.refrigerator: 0.7429, IoU.grandstand: 0.0932, IoU.path: 0.1153, IoU.stairs: 0.4339, IoU.runway: 0.7691, IoU.case: 0.4093, IoU.pool table: 0.9316, IoU.pillow: 0.5858, IoU.screen door: 0.7204, IoU.stairway: 0.4679, IoU.river: 0.1758, IoU.bridge: 0.7174, IoU.bookcase: 0.2798, IoU.blind: 0.2581, IoU.coffee table: 0.6082, IoU.toilet: 0.8649, IoU.flower: 0.3578, IoU.book: 0.4222, IoU.hill: 0.0758, IoU.bench: 0.5148, IoU.countertop: 0.5701, IoU.stove: 0.7134, IoU.palm: 0.4901, IoU.kitchen island: 0.4625, IoU.computer: 0.6941, IoU.swivel chair: 0.3510, IoU.boat: 0.6418, IoU.bar: 0.3351, IoU.arcade machine: 0.4262, IoU.hovel: 0.1780, IoU.bus: 0.9084, IoU.towel: 0.7362, IoU.light: 0.4107, IoU.truck: 0.3727, IoU.tower: 0.0950, IoU.chandelier: 0.6317, IoU.awning: 0.2754, IoU.streetlight: 0.2644, IoU.booth: 0.1970, IoU.television receiver: 0.7397, IoU.airplane: 0.5907, IoU.dirt track: 0.1178, IoU.apparel: 0.4652, IoU.pole: 0.1921, IoU.land: 0.0000, IoU.bannister: 0.0703, IoU.escalator: 0.5831, IoU.ottoman: 0.4685, IoU.bottle: 0.2331, IoU.buffet: 0.4345, IoU.poster: 0.2930, IoU.stage: 0.0803, IoU.van: 0.1190, IoU.ship: 0.0000, IoU.fountain: 0.1262, IoU.conveyer belt: 0.7909, IoU.canopy: 0.3915, IoU.washer: 0.7819, IoU.plaything: 0.3059, IoU.swimming pool: 0.5183, IoU.stool: 0.3519, IoU.barrel: 0.2982, IoU.basket: 0.3695, IoU.waterfall: 0.4416, IoU.tent: 0.0000, IoU.bag: 0.2581, IoU.minibike: 0.7197, IoU.cradle: 0.7689, IoU.oven: 0.4250, IoU.ball: 0.3789, IoU.food: 0.2548, IoU.step: 0.1014, IoU.tank: 0.2889, IoU.trade name: 0.3026, IoU.microwave: 0.8038, IoU.pot: 0.5296, IoU.animal: 0.6268, IoU.bicycle: 0.5851, IoU.lake: 0.0000, IoU.dishwasher: 0.6573, IoU.screen: 0.5218, IoU.blanket: 0.1831, IoU.sculpture: 0.4349, IoU.hood: 0.5608, IoU.sconce: 0.3673, IoU.vase: 0.3767, IoU.traffic light: 0.2961, IoU.tray: 0.1227, IoU.ashcan: 0.4692, IoU.fan: 0.5300, IoU.pier: 0.3329, IoU.crt screen: 0.1362, IoU.plate: 0.5306, IoU.monitor: 0.0178, IoU.bulletin board: 0.4476, IoU.shower: 0.0013, IoU.radiator: 0.6474, IoU.glass: 0.1989, IoU.clock: 0.3804, IoU.flag: 0.6561, Acc.wall: 0.8871, Acc.building: 0.9158, Acc.sky: 0.9731, Acc.floor: 0.9114, Acc.tree: 0.8805, Acc.ceiling: 0.9255, Acc.road: 0.8993, Acc.bed : 0.9552, Acc.windowpane: 0.7958, Acc.grass: 0.7797, Acc.cabinet: 0.7388, Acc.sidewalk: 0.8168, Acc.person: 0.9380, Acc.earth: 0.4742, Acc.door: 0.6444, Acc.table: 0.7790, Acc.mountain: 0.6256, Acc.plant: 0.6213, Acc.curtain: 0.8791, Acc.chair: 0.6928, Acc.car: 0.9384, Acc.water: 0.6848, Acc.painting: 0.8739, Acc.sofa: 0.9010, Acc.shelf: 0.5175, Acc.house: 0.3211, Acc.sea: 0.7035, Acc.mirror: 0.7651, Acc.rug: 0.6789, Acc.field: 0.6500, Acc.armchair: 0.6436, Acc.seat: 0.6674, Acc.fence: 0.3946, Acc.desk: 0.6412, Acc.rock: 0.7701, Acc.wardrobe: 0.5070, Acc.lamp: 0.7667, Acc.bathtub: 0.8713, Acc.railing: 0.5119, Acc.cushion: 0.7496, Acc.base: 0.3819, Acc.box: 0.3358, Acc.column: 0.6355, Acc.signboard: 0.4858, Acc.chest of drawers: 0.6559, Acc.counter: 0.3958, Acc.sand: 0.8479, Acc.sink: 0.8192, Acc.skyscraper: 0.7007, Acc.fireplace: 0.8764, Acc.refrigerator: 0.8756, Acc.grandstand: 0.1032, Acc.path: 0.1609, Acc.stairs: 0.5798, Acc.runway: 0.8881, Acc.case: 0.5437, Acc.pool table: 0.9645, Acc.pillow: 0.6822, Acc.screen door: 0.7597, Acc.stairway: 0.6967, Acc.river: 0.4642, Acc.bridge: 0.7961, Acc.bookcase: 0.4628, Acc.blind: 0.3014, Acc.coffee table: 0.8261, Acc.toilet: 0.9128, Acc.flower: 0.5566, Acc.book: 0.7227, Acc.hill: 0.0907, Acc.bench: 0.5978, Acc.countertop: 0.7262, Acc.stove: 0.8591, Acc.palm: 0.7718, Acc.kitchen island: 0.8004, Acc.computer: 0.8016, Acc.swivel chair: 0.5079, Acc.boat: 0.8113, Acc.bar: 0.4998, Acc.arcade machine: 0.4491, Acc.hovel: 0.2097, Acc.bus: 0.9451, Acc.towel: 0.8350, Acc.light: 0.4940, Acc.truck: 0.4807, Acc.tower: 0.1649, Acc.chandelier: 0.7636, Acc.awning: 0.3826, Acc.streetlight: 0.3712, Acc.booth: 0.2287, Acc.television receiver: 0.8556, Acc.airplane: 0.6528, Acc.dirt track: 0.3086, Acc.apparel: 0.6753, Acc.pole: 0.2506, Acc.land: 0.0000, Acc.bannister: 0.0939, Acc.escalator: 0.7537, Acc.ottoman: 0.6330, Acc.bottle: 0.3093, Acc.buffet: 0.5915, Acc.poster: 0.3347, Acc.stage: 0.1524, Acc.van: 0.1445, Acc.ship: 0.0000, Acc.fountain: 0.1278, Acc.conveyer belt: 0.9330, Acc.canopy: 0.5069, Acc.washer: 0.7953, Acc.plaything: 0.3871, Acc.swimming pool: 0.5222, Acc.stool: 0.4341, Acc.barrel: 0.3060, Acc.basket: 0.5128, Acc.waterfall: 0.7102, Acc.tent: 0.0000, Acc.bag: 0.3201, Acc.minibike: 0.8471, Acc.cradle: 0.9734, Acc.oven: 0.5289, Acc.ball: 0.6997, Acc.food: 0.2632, Acc.step: 0.1288, Acc.tank: 0.3118, Acc.trade name: 0.4099, Acc.microwave: 0.8989, Acc.pot: 0.6142, Acc.animal: 0.6420, Acc.bicycle: 0.7589, Acc.lake: 0.0000, Acc.dishwasher: 0.7972, Acc.screen: 0.6402, Acc.blanket: 0.2173, Acc.sculpture: 0.5044, Acc.hood: 0.6521, Acc.sconce: 0.4886, Acc.vase: 0.5537, Acc.traffic light: 0.4604, Acc.tray: 0.2922, Acc.ashcan: 0.6119, Acc.fan: 0.6159, Acc.pier: 0.3608, Acc.crt screen: 0.3074, Acc.plate: 0.7765, Acc.monitor: 0.0212, Acc.bulletin board: 0.6817, Acc.shower: 0.0032, Acc.radiator: 0.6882, Acc.glass: 0.2189, Acc.clock: 0.4416, Acc.flag: 0.7161